Building Spark Data Pipelines in the Cloud —What You Need to Get Started | by Assaf Pinhasi | Nov, 2022
Common engineering challenges and recipes for solutionsGenerated with stable diffusion by the authorOver the last ten years or so, authoring and executing Spark jobs has become considerably simpler, mainly thanks to:High level APIs — which make it easier to express logic.Managed cloud-based platforms — highly scalable object storage and one click ephemeral clusters based on spot instances make it infinitely simpler to run jobs (and delay the need to optimize them)While authoring logic in Spark and executing jobs has…