Techno Blender
Digitally Yours.
Browsing Tag

Airflow

Mastering ExternalTaskSensor in Apache Airflow: How to Calculate Execution Delta | by Casey Cheng | May, 2023

External Task Sensors stop bad data from trickling downstream in a data pipeline. Leverage them to create a reliable data infrastructure.External Task Sensors are like gatekeepers — they stop bad data from trickling downstream. Image by Freepik.Orchestrating a data pipeline is a delicate endeavor. In a data pipeline, we can have thousands of tasks running simultaneously and they are often dependent on one another. If we’re not careful, a single point of failure can have a domino-like effect that trickles downstream and…

Creating a YouTube Data Pipeline with AWS and Apache Airflow | by Aashish Nair | Apr, 2023

IntroductionYouTube has become a major medium of exchange for information, thoughts, and ideas, with an average of 3 million videos being uploaded each day. The video streaming platform always has a new topic of conversation prepared for its audience with its diverse content, ranging from somber news stories to upbeat music videos.That being said, with a constant influx of video content, it’s difficult to gauge what types of content attract the attention of the fickle YouTube audience the most.In general, what types of…

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) | by João Pedro | Apr, 2023

Learning a little about these tools and how to integrate themPhoto by Nolan Krattinger on UnsplashA few weeks ago, while doing my mental stretch to think about new post ideas, I thought: Well, I need to learn (and talk) more about cloud and these things, I’ve practiced a lot on on-premise ambients, using open-source tools, and running away from proprietary solutions… But the world is cloud and I don’t think that this is gonna change any time soon…I then wrote a post about creating a data pipeline with local Spark and GCP,…

Building Pipelines In Apache Airflow – For Beginners | by Aashish Nair | Mar, 2023

A quick and simple demo for running DAGs on AirflowPhoto by Kelly Sikkema on UnsplashApache Airflow is quite popular in the data science and data engineering space. It boasts many features that enable users to programmatically create, manage, and monitor complex workflows.However, the platform’s range of features may inadvertently become a detriment to beginners. New users that explore Apache Airflow’s documentation and tutorials can easily become inundated by new terminology, tools, and concepts.With the aim of creating…

Here Is What I Learned Using Apache Airflow over 6 Years | by Chengzhi Zhao | Jan, 2023

A journey with Apache Airflow from experiment to production hassle-freePhoto by Karsten Würth on UnsplashApache Airflow is undoubtedly the most popular open-source project for data engineering for years. It gains popularity at the right time with The Rise Of Data Engineer, and the core concept of making code as the first-class citizen instead of drag and drop for data pipeline (aka. ETL) is a milestone. The Apache Airflow became an Apache Incubator project in March 2016 and became the top project in January 2019. I have…

Three Helpful Tips to Know Before Choosing Apache Airflow As a Workflow Management Platform | by Chengzhi Zhao | Dec, 2022

Suggestions To Help You Decide Choosing Apache AirflowPhoto by Yaakov Winiarz on UnsplashApache Airflow is a fantastic choice to pick as a workflow management platform. However, it doesn't mean Airflow can be a blind go-to option. There are many discussions in StackOverflow that engineers ask questions beyond what Airflow was designed for. In summary, Airflow could be better for all the use cases. There are some caveats to assess before determining to finalize with Airflow. This article will dive deep into three helpful…

Getting Started with Astronomer Airflow: The Data Engineering Workhorse | by Brian Roepke | Nov, 2022

Build powerful and scalable data pipelines with Astronomer, the managed Airflow service powered by PythonPhoto by Nathan Anderson on UnsplashWe’ll start here with Airflow. Apache Airflow is an open-source workflow management platform that helps you build Data Engineering Pipelines. One of the biggest advantages to Airflow, and why it is so popular, is that you write your configuration in Python in the form of what is referred to as a DAG ( Directed Acyclic Graph). The power of writing a DAG with Python means that you can…

Azza Aero 480 Review: Too Much Airflow?

When I think of Azza, the first thing that comes to mind is its geometric cases, like the company’s Cube and Pyramid lines. It’s safe to say that Azza mostly caters to a niche case market, at least in the US. However, the new Aero 480 is much less adventurous in terms a design, but still exciting thanks to loads of mes and four PWM ARGB fans for its $110 asking price.With its budget-friendly price and generous seemingly generous feature set, can the Aero 480 earn a spot on our Best PC Cases list? As always, we’ll have to…

AWS Managed Workflows for Apache Airflow vs. Glue | by Minseok Song | Oct, 2022

Find out the differences between MWAA and AWS GluePhoto by Martin Adams on UnsplashIn 2020, AWS launched Amazon Managed Workflows for Apache Airflow (MWAA). Apache Airflow is an open-source job orchestration platform that was built by Airbnb in 2014. Since then, many companies started using it and adopted it for various use cases. It is a workflow orchestration tool that allows users to run jobs sequentially and logically at a scheduled time or as an ad-hoc execution. Thanks to its architecture that does not rely on…

Is Apache Airflow DAG Authoring Certification Worth Your Time? | by AnBento | Aug, 2022

An honest Review Of Astronomer Certification.Photo By Pixably On PexelsBig part of my work as a data engineer consists of designing reliable, efficient and reproducible ETL jobs.Over the last two years, Apache Airflow has been the main orchestrator I have been using for authoring, scheduling and monitoring data pipelines.For this reason, I recently decided to challenge myself by taking the Astronomer Certification for DAG Authoring which is meant to assess knowledge of designing and creating data pipelines following best…