Techno Blender
Digitally Yours.
Browsing Tag

Airflow

How to develop a CD pipeline for Airflow | by Chris Young | Aug, 2022

Leveraging GitHub actions and Astronomer to quickly push code updates to a production environmentPhoto by Mike Benna on UnsplashApache Airflow is a popular data orchestration tool used to manage workflows and tasks. However, one of the big questions I continue to come up against is how to deploy production-ready instances of Airflow. Options for hosting Airflow include self-management on a virtual machine, deploying to the cloud-based platform Astronomer, leveraging AWS MWAA, and more.Of these options, I have found…

Airflow Architecture | Towards Data Science

A deep dive into Apache Airflow architecture and how it orchestrates workflowsPhoto by La-Rel Easter on UnsplashApache Airflow is among the most commonly used frameworks when it comes to pipeline scheduling and execution and has been gaining a huge traction within the Data Engineering community over the last few years.The technology itself is consisted of many different components that work together in order to perform certain operations. In today’s article we will be discussing about the overall architecture of Apache…

Automated Alerts for Airflow with Slack | by Chris Young | Jul, 2022

Take advantage of the Slack API to get automatic updates and alerts for DAG task failures and successesPhoto by Artturi Jalli on UnsplashLet’s face it — sometimes it can take a while for Airflow dags to run. Instead of constantly coming back to the Airflow UI to check for dag updates, why not start catching up on emails, messages, and backlog items and then be notified of the run results via Slack?Managing Airflow notifications through Slack enables easy access for monitoring and debugging Airflow tasks. A dedicated slack…

Load Data From Postgres to BigQuery With Airflow | by Giorgos Myrianthous | Jul, 2022

PostgreSQL to BigQuery Data Ingestion with Apache Airflow — a step-by-step guidePhoto by Venti Views on UnsplashOne way for ingesting data from a Postgres database (hosted on-premise) into Google Cloud BigQuery is with the use of Airflow, that offers tons of operators that can be used for data ingestion and integration processes.It is also important to favour the use of these operators since they can help us write less and more simple code to perform fundamental operations in the wider context of data engineering.Now in…

3 steps for Building Airflow Pipelines with Efficient Resource Utilisation | by Vachan Anand | Jul, 2022 | Medium

Photo by Kyler Nixon on UnsplashThis blog looks at some Airflow features valuable for managing resources in a workflow. We will explore some interrelated concepts that affect resource utilisation of the underlying infrastructure for any data pipelines. In particular, we will explore the following concepts :Airflow Pools for capping resource allocation to a group of tasks based on a predefined metric.Parallelism & Concurrency for efficiently scaling the pipelines to utilise the available infrastructure fully.Priority…

5 Steps to Build Efficient Data Pipelines with Apache Airflow | by Vachan Anand | Jun, 2022

Uncovering best practices to optimise big data pipelinesPhoto by Chinh Le Duc on UnsplashAirflow is an open-source workflow orchestration tool. Although used extensively to build data pipelines, airflow can be used to manage quite a wide variety of workflows.Simply put, if we were to build a scalable system to perform a set of tasks in an orderly fashion that requires interacting with different components, we could manage such a workflow with airflow. We use DAGs (directed acyclic graphs) to perform such…

Using Airflow Decorators to Author DAGs

Authoring Apache Airflow DAGs and Tasks with Python decoratorsPhoto by Chaitanya Tvs on UnsplashIntroductionThe most common way for writing pipelines in Airflows is by using the DAG context managers to automatically assign new operators to that DAG. As of Airflow 2, you can now use decorators in order to author Airflow DAGs and Tasks.In today’s tutorial, we will introduce these decorators and showcase how one can use them in order to write cleaner code. Additionally, we will also demonstrate how the same DAG would have…

How To Fix Task received SIGTERM signal In Airflow

Fixing the SIGTERM signal in Apache Airflow tasksPhoto by Jeremy Perkins on UnsplashIntroductionWhile I have been recently working on migrating DAGs from Airflow 1 (v1.10.15) to Airflow 2 (v2.2.5) I’ve spent a lot of time trying to figure out one error that I was getting for some of the DAGs that wasn’t informative at all.WARNING airflow.exceptions.AirflowException: Task received SIGTERM signalINFO - Marking task as FAILED.Even though I have spent some time trying out possible solutions that I’ve found online, none of…

How to Design Better DAGs in Apache Airflow | by Marvin Lanhenke | Jun, 2022

Data EngineeringThe two most important properties you need to know when designing a workflowPhoto by Campaign Creators on UnsplashLast week, we learned how to quickly spin up a development environment for Apache Airflow.This is awesome!However, we have yet to learn how to design an efficient workflow. Simply having a great tool at our fingertips won’t cut the deal alone — unfortunately.Although Apache Airflow does a pretty good job at doing most of the heavy lifting for us, we still need to ensure certain key properties…

How To Setup HashiCorp Vault with Airflow

Integrating HashiCorp Vault with Apache AirflowPhoto by Cristina Gottardi on UnsplashIntroductionBy default, Apache Airflow read connections and variables from the Metadata Database that essentially stores everything that is visible on the corresponding tab of Airflow UI.Even though there’s absolutely nothing particularly wrong with adding (or removing) connections and variables through the UI (and thus storing them on the matadata database that also offers encryption at rest) it may sometimes be more manageable to…