Prefect + AWS ECS Fargate + GitHub Actions Make Serverless Dataflows As Easy as .py | by Anna Geller | Sep, 2022
Prefect is a global coordination plane for any data stack. It allows you to design, build, schedule, and monitor your dataflows, covering the full spectrum from observability to orchestration. Here is an example of a simple Prefect flow:
With a single CLI command, you can build a scheduled deployment that can run on any infrastructure — a local process, a Docker container, a Kubernetes job, or a serverless task on ECS Fargate. In this post, we’ll focus on the latter. But before we can deploy flows, we need an agent that can poll for scheduled runs. We’ll cover that next.
To start a Prefect agent in a serverless container running on AWS, you need to create an ECS task definition. Then, you can use that task definition to start an ECS service which will ensure that the agent container is running reliably 24/7.
To make this tutorial easy to follow, we’ll skip details about configuring IAM permissions for the task role and execution role and packaging dependencies to an Amazon ECR image. All that information is available in the infrastructure folder in the accompanying GitHub repository dataflow-ops. You can use the Cloudformation templates available there as-is or adjust them to your needs.
Create a new repository from the DataflowOps template
Use the repository template dataflow-ops to create your own repository:
Create an API Key in Prefect Cloud
To get started with Prefect, sign up for the free tier of Prefect Cloud. Once logged in, create a workspace and an API key.
Creating a workspace and API key are important for the AWS ECS Fargate setup.
Install Prefect locally
The easiest way to get started with Prefect is to install it locally:
pip install prefect
Authenticate your terminal with Prefect Cloud & run your first flow
Run the command below to authenticate a local terminal session with your Prefect Cloud workspace:
prefect cloud login
You’ll be prompted to enter the previously created API key and select your workspace. Then, you can run the example flow as any other Python script:
python maintenance_flow.py
When you switch to your browser, you should now see that flow run in your Prefect UI:
You may be asking yourself, how is that possible? How does Prefect know that you ran this flow from your terminal? That’s the magic of Prefect API! You don’t have to create any DAGs or learn any custom vocabulary to track the workflow execution. Prefect flows are like turbocharged Python scripts. Whether you run them locally or from a serverless container on AWS, Prefect ensures that your execution metadata is kept for observability.
Retrieve the PREFECT_API_URL and PREFECT_API_KEY
Now that your terminal is authenticated, run:
prefect config view
You should get a similar output to:
Set the PREFECT_API_URL and PREFECT_API_KEY as repository secrets
Use the values of PREFECT_API_URL and PREFECT_API_KEY shown in the terminal output to configure repository secrets, as shown in the image below.
Add the AWS access keys of your IAM user as repository secrets: AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.
Start the GitHub Actions workflow to deploy your Prefect agent
Now that your GitHub repository secrets point to your AWS and Prefect Cloud accounts, you are ready to trigger the GitHub Actions workflow:
Before running this workflow, you can configure the Prefect version, AWS region, CPU and memory allocation, and the name for your Prefect storage and infrastructure blocks.
Inspect the output and download run artifacts
Once this workflow successfully completes, you should see a similar diagram and output:
You can download the ECS task definition used to deploy your ECS agent and the YAML manifests of the flow deployments, as shown in the highlighted section in the image above. These artifacts are helpful for auditability and troubleshooting.
Video walkthrough as a recap
The link below demonstrates the same workflow in a video format:
You can inspect the output of the agent’s CloudFormation stack to validate which resources were created.
Similarly, if you inspect the Prefect Cloud UI, you’ll find the same maintenance flow we mentioned previously. This flow is scheduled to run every 10 seconds to demonstrate the functionality of scheduled workflows with Prefect. If you inspect the logs of any run of that flow, you’ll see that it’s running on a serverless container deployed to AWS EC2:
Validate deployments
You can validate deployments from the Prefect UI:
The image above shows that there are two kinds of deployments set up as part of this demo:
- The deployments with the name
dataflowops-local
are configured to run directly in the agent’s local process, i.e. in the same container as your Prefect agent. - In contrast, deployments with the name
dataflowops
are configured to run as independent ECS tasks.
The second option brings a higher latency (serverless container must be provisioned first) but might work better at scale. As you reach a certain number of workflows, you could hit the limits of running everything in the same container. The ECSTask
infrastructure block allows you to run each flow run in its own serverless container without occupying the resources of the agent process (running in its own ECS task).
Inspect and, optionally, modify block values
When visiting the blocks page, you can inspect all blocks that have been created as part of this demo. You should see S3, AWS Credentials, and ECS Task blocks. You can change the values such as CPU, memory, or environment variables directly from the UI without having to redeploy your code.
If you would like to create a new ECSTask
block, you can do that:
a) from the UI, as shown in the image above,
b) from Python code, as shown in this code snippet.
To use the ECSTask
block in your deployment, all you need to do is specify the ecs-task/block_name
as your infrastructure block (-ib
). Here is an example:
prefect deployment build flows/healthcheck.py:healthcheck -n prod -q prod -a -sb s3/prod -ib ecs-task/prod
You can do that not only from CI/CD but even from your local terminal. The Prefect Deployments FAQ on Discourse provides many helpful resources on how to work with various deployment use cases.
To delete all AWS resources created as part of this demo, run the following workflow. It will delete both CloudFormation stacks.
This post covered how to get started with Prefect and ECS Fargate and how you can deploy the agent and flows as an automated GitHub Actions workflow. If you have questions about this setup, you can submit a GitHub issue directly on the dataflowops repository or ask a question via Prefect Discourse or Slack.
Thanks for reading!
Prefect is a global coordination plane for any data stack. It allows you to design, build, schedule, and monitor your dataflows, covering the full spectrum from observability to orchestration. Here is an example of a simple Prefect flow:
With a single CLI command, you can build a scheduled deployment that can run on any infrastructure — a local process, a Docker container, a Kubernetes job, or a serverless task on ECS Fargate. In this post, we’ll focus on the latter. But before we can deploy flows, we need an agent that can poll for scheduled runs. We’ll cover that next.
To start a Prefect agent in a serverless container running on AWS, you need to create an ECS task definition. Then, you can use that task definition to start an ECS service which will ensure that the agent container is running reliably 24/7.
To make this tutorial easy to follow, we’ll skip details about configuring IAM permissions for the task role and execution role and packaging dependencies to an Amazon ECR image. All that information is available in the infrastructure folder in the accompanying GitHub repository dataflow-ops. You can use the Cloudformation templates available there as-is or adjust them to your needs.
Create a new repository from the DataflowOps template
Use the repository template dataflow-ops to create your own repository:
Create an API Key in Prefect Cloud
To get started with Prefect, sign up for the free tier of Prefect Cloud. Once logged in, create a workspace and an API key.
Creating a workspace and API key are important for the AWS ECS Fargate setup.
Install Prefect locally
The easiest way to get started with Prefect is to install it locally:
pip install prefect
Authenticate your terminal with Prefect Cloud & run your first flow
Run the command below to authenticate a local terminal session with your Prefect Cloud workspace:
prefect cloud login
You’ll be prompted to enter the previously created API key and select your workspace. Then, you can run the example flow as any other Python script:
python maintenance_flow.py
When you switch to your browser, you should now see that flow run in your Prefect UI:
You may be asking yourself, how is that possible? How does Prefect know that you ran this flow from your terminal? That’s the magic of Prefect API! You don’t have to create any DAGs or learn any custom vocabulary to track the workflow execution. Prefect flows are like turbocharged Python scripts. Whether you run them locally or from a serverless container on AWS, Prefect ensures that your execution metadata is kept for observability.
Retrieve the PREFECT_API_URL and PREFECT_API_KEY
Now that your terminal is authenticated, run:
prefect config view
You should get a similar output to:
Set the PREFECT_API_URL and PREFECT_API_KEY as repository secrets
Use the values of PREFECT_API_URL and PREFECT_API_KEY shown in the terminal output to configure repository secrets, as shown in the image below.
Add the AWS access keys of your IAM user as repository secrets: AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.
Start the GitHub Actions workflow to deploy your Prefect agent
Now that your GitHub repository secrets point to your AWS and Prefect Cloud accounts, you are ready to trigger the GitHub Actions workflow:
Before running this workflow, you can configure the Prefect version, AWS region, CPU and memory allocation, and the name for your Prefect storage and infrastructure blocks.
Inspect the output and download run artifacts
Once this workflow successfully completes, you should see a similar diagram and output:
You can download the ECS task definition used to deploy your ECS agent and the YAML manifests of the flow deployments, as shown in the highlighted section in the image above. These artifacts are helpful for auditability and troubleshooting.
Video walkthrough as a recap
The link below demonstrates the same workflow in a video format:
You can inspect the output of the agent’s CloudFormation stack to validate which resources were created.
Similarly, if you inspect the Prefect Cloud UI, you’ll find the same maintenance flow we mentioned previously. This flow is scheduled to run every 10 seconds to demonstrate the functionality of scheduled workflows with Prefect. If you inspect the logs of any run of that flow, you’ll see that it’s running on a serverless container deployed to AWS EC2:
Validate deployments
You can validate deployments from the Prefect UI:
The image above shows that there are two kinds of deployments set up as part of this demo:
- The deployments with the name
dataflowops-local
are configured to run directly in the agent’s local process, i.e. in the same container as your Prefect agent. - In contrast, deployments with the name
dataflowops
are configured to run as independent ECS tasks.
The second option brings a higher latency (serverless container must be provisioned first) but might work better at scale. As you reach a certain number of workflows, you could hit the limits of running everything in the same container. The ECSTask
infrastructure block allows you to run each flow run in its own serverless container without occupying the resources of the agent process (running in its own ECS task).
Inspect and, optionally, modify block values
When visiting the blocks page, you can inspect all blocks that have been created as part of this demo. You should see S3, AWS Credentials, and ECS Task blocks. You can change the values such as CPU, memory, or environment variables directly from the UI without having to redeploy your code.
If you would like to create a new ECSTask
block, you can do that:
a) from the UI, as shown in the image above,
b) from Python code, as shown in this code snippet.
To use the ECSTask
block in your deployment, all you need to do is specify the ecs-task/block_name
as your infrastructure block (-ib
). Here is an example:
prefect deployment build flows/healthcheck.py:healthcheck -n prod -q prod -a -sb s3/prod -ib ecs-task/prod
You can do that not only from CI/CD but even from your local terminal. The Prefect Deployments FAQ on Discourse provides many helpful resources on how to work with various deployment use cases.
To delete all AWS resources created as part of this demo, run the following workflow. It will delete both CloudFormation stacks.
This post covered how to get started with Prefect and ECS Fargate and how you can deploy the agent and flows as an automated GitHub Actions workflow. If you have questions about this setup, you can submit a GitHub issue directly on the dataflowops repository or ask a question via Prefect Discourse or Slack.
Thanks for reading!