Techno Blender
Digitally Yours.

Prefect + AWS ECS Fargate + GitHub Actions Make Serverless Dataflows As Easy as .py | by Anna Geller | Sep, 2022

0 100


Prefect is a global coordination plane for any data stack. It allows you to design, build, schedule, and monitor your dataflows, covering the full spectrum from observability to orchestration. Here is an example of a simple Prefect flow:

With a single CLI command, you can build a scheduled deployment that can run on any infrastructure — a local process, a Docker container, a Kubernetes job, or a serverless task on ECS Fargate. In this post, we’ll focus on the latter. But before we can deploy flows, we need an agent that can poll for scheduled runs. We’ll cover that next.

To start a Prefect agent in a serverless container running on AWS, you need to create an ECS task definition. Then, you can use that task definition to start an ECS service which will ensure that the agent container is running reliably 24/7.

To make this tutorial easy to follow, we’ll skip details about configuring IAM permissions for the task role and execution role and packaging dependencies to an Amazon ECR image. All that information is available in the infrastructure folder in the accompanying GitHub repository dataflow-ops. You can use the Cloudformation templates available there as-is or adjust them to your needs.

Create a new repository from the DataflowOps template

Use the repository template dataflow-ops to create your own repository:

Create a new repository from template — image by author

Create an API Key in Prefect Cloud

To get started with Prefect, sign up for the free tier of Prefect Cloud. Once logged in, create a workspace and an API key.

Creating an API key in Prefect Cloud — image by author

Creating a workspace and API key are important for the AWS ECS Fargate setup.

Install Prefect locally

The easiest way to get started with Prefect is to install it locally:

pip install prefect

Authenticate your terminal with Prefect Cloud & run your first flow

Run the command below to authenticate a local terminal session with your Prefect Cloud workspace:

prefect cloud login

You’ll be prompted to enter the previously created API key and select your workspace. Then, you can run the example flow as any other Python script:

python maintenance_flow.py

When you switch to your browser, you should now see that flow run in your Prefect UI:

Prefect Cloud showing logs from a locally executed flow — image by author

You may be asking yourself, how is that possible? How does Prefect know that you ran this flow from your terminal? That’s the magic of Prefect API! You don’t have to create any DAGs or learn any custom vocabulary to track the workflow execution. Prefect flows are like turbocharged Python scripts. Whether you run them locally or from a serverless container on AWS, Prefect ensures that your execution metadata is kept for observability.

Retrieve the PREFECT_API_URL and PREFECT_API_KEY

Now that your terminal is authenticated, run:

prefect config view

You should get a similar output to:

Retrieve API URL and API key from the terminal output — image by author

Set the PREFECT_API_URL and PREFECT_API_KEY as repository secrets

Use the values of PREFECT_API_URL and PREFECT_API_KEY shown in the terminal output to configure repository secrets, as shown in the image below.

Configuring repository secrets — image by author

Add the AWS access keys of your IAM user as repository secrets: AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.

Start the GitHub Actions workflow to deploy your Prefect agent

Now that your GitHub repository secrets point to your AWS and Prefect Cloud accounts, you are ready to trigger the GitHub Actions workflow:

Starting the GitHub Actions workflow deploying Prefect agent and flows to AWS ECS and S3 — image by author

Before running this workflow, you can configure the Prefect version, AWS region, CPU and memory allocation, and the name for your Prefect storage and infrastructure blocks.

Inspect the output and download run artifacts

Once this workflow successfully completes, you should see a similar diagram and output:

Workflow summary — image by the author

You can download the ECS task definition used to deploy your ECS agent and the YAML manifests of the flow deployments, as shown in the highlighted section in the image above. These artifacts are helpful for auditability and troubleshooting.

Video walkthrough as a recap

The link below demonstrates the same workflow in a video format:

You can inspect the output of the agent’s CloudFormation stack to validate which resources were created.

Similarly, if you inspect the Prefect Cloud UI, you’ll find the same maintenance flow we mentioned previously. This flow is scheduled to run every 10 seconds to demonstrate the functionality of scheduled workflows with Prefect. If you inspect the logs of any run of that flow, you’ll see that it’s running on a serverless container deployed to AWS EC2:

Flow run logs — image by the author

Validate deployments

You can validate deployments from the Prefect UI:

Deployment page in the Prefect UI — image by the author

The image above shows that there are two kinds of deployments set up as part of this demo:

  1. The deployments with the name dataflowops-local are configured to run directly in the agent’s local process, i.e. in the same container as your Prefect agent.
  2. In contrast, deployments with the name dataflowops are configured to run as independent ECS tasks.

The second option brings a higher latency (serverless container must be provisioned first) but might work better at scale. As you reach a certain number of workflows, you could hit the limits of running everything in the same container. The ECSTask infrastructure block allows you to run each flow run in its own serverless container without occupying the resources of the agent process (running in its own ECS task).

Inspect and, optionally, modify block values

When visiting the blocks page, you can inspect all blocks that have been created as part of this demo. You should see S3, AWS Credentials, and ECS Task blocks. You can change the values such as CPU, memory, or environment variables directly from the UI without having to redeploy your code.

Modifying ECSTask block values from the Prefect UI — image by author

If you would like to create a new ECSTask block, you can do that:
a) from the UI, as shown in the image above,
b) from Python code, as shown in this code snippet.

To use the ECSTask block in your deployment, all you need to do is specify the ecs-task/block_name as your infrastructure block (-ib). Here is an example:

prefect deployment build flows/healthcheck.py:healthcheck -n prod -q prod -a -sb s3/prod -ib ecs-task/prod

You can do that not only from CI/CD but even from your local terminal. The Prefect Deployments FAQ on Discourse provides many helpful resources on how to work with various deployment use cases.

To delete all AWS resources created as part of this demo, run the following workflow. It will delete both CloudFormation stacks.

Deletion of CloudFormation resources incl. ECR repository and ECS cluster — image by author

This post covered how to get started with Prefect and ECS Fargate and how you can deploy the agent and flows as an automated GitHub Actions workflow. If you have questions about this setup, you can submit a GitHub issue directly on the dataflowops repository or ask a question via Prefect Discourse or Slack.

Thanks for reading!


Prefect is a global coordination plane for any data stack. It allows you to design, build, schedule, and monitor your dataflows, covering the full spectrum from observability to orchestration. Here is an example of a simple Prefect flow:

With a single CLI command, you can build a scheduled deployment that can run on any infrastructure — a local process, a Docker container, a Kubernetes job, or a serverless task on ECS Fargate. In this post, we’ll focus on the latter. But before we can deploy flows, we need an agent that can poll for scheduled runs. We’ll cover that next.

To start a Prefect agent in a serverless container running on AWS, you need to create an ECS task definition. Then, you can use that task definition to start an ECS service which will ensure that the agent container is running reliably 24/7.

To make this tutorial easy to follow, we’ll skip details about configuring IAM permissions for the task role and execution role and packaging dependencies to an Amazon ECR image. All that information is available in the infrastructure folder in the accompanying GitHub repository dataflow-ops. You can use the Cloudformation templates available there as-is or adjust them to your needs.

Create a new repository from the DataflowOps template

Use the repository template dataflow-ops to create your own repository:

Create a new repository from template — image by author

Create an API Key in Prefect Cloud

To get started with Prefect, sign up for the free tier of Prefect Cloud. Once logged in, create a workspace and an API key.

Creating an API key in Prefect Cloud — image by author

Creating a workspace and API key are important for the AWS ECS Fargate setup.

Install Prefect locally

The easiest way to get started with Prefect is to install it locally:

pip install prefect

Authenticate your terminal with Prefect Cloud & run your first flow

Run the command below to authenticate a local terminal session with your Prefect Cloud workspace:

prefect cloud login

You’ll be prompted to enter the previously created API key and select your workspace. Then, you can run the example flow as any other Python script:

python maintenance_flow.py

When you switch to your browser, you should now see that flow run in your Prefect UI:

Prefect Cloud showing logs from a locally executed flow — image by author

You may be asking yourself, how is that possible? How does Prefect know that you ran this flow from your terminal? That’s the magic of Prefect API! You don’t have to create any DAGs or learn any custom vocabulary to track the workflow execution. Prefect flows are like turbocharged Python scripts. Whether you run them locally or from a serverless container on AWS, Prefect ensures that your execution metadata is kept for observability.

Retrieve the PREFECT_API_URL and PREFECT_API_KEY

Now that your terminal is authenticated, run:

prefect config view

You should get a similar output to:

Retrieve API URL and API key from the terminal output — image by author

Set the PREFECT_API_URL and PREFECT_API_KEY as repository secrets

Use the values of PREFECT_API_URL and PREFECT_API_KEY shown in the terminal output to configure repository secrets, as shown in the image below.

Configuring repository secrets — image by author

Add the AWS access keys of your IAM user as repository secrets: AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.

Start the GitHub Actions workflow to deploy your Prefect agent

Now that your GitHub repository secrets point to your AWS and Prefect Cloud accounts, you are ready to trigger the GitHub Actions workflow:

Starting the GitHub Actions workflow deploying Prefect agent and flows to AWS ECS and S3 — image by author

Before running this workflow, you can configure the Prefect version, AWS region, CPU and memory allocation, and the name for your Prefect storage and infrastructure blocks.

Inspect the output and download run artifacts

Once this workflow successfully completes, you should see a similar diagram and output:

Workflow summary — image by the author

You can download the ECS task definition used to deploy your ECS agent and the YAML manifests of the flow deployments, as shown in the highlighted section in the image above. These artifacts are helpful for auditability and troubleshooting.

Video walkthrough as a recap

The link below demonstrates the same workflow in a video format:

You can inspect the output of the agent’s CloudFormation stack to validate which resources were created.

Similarly, if you inspect the Prefect Cloud UI, you’ll find the same maintenance flow we mentioned previously. This flow is scheduled to run every 10 seconds to demonstrate the functionality of scheduled workflows with Prefect. If you inspect the logs of any run of that flow, you’ll see that it’s running on a serverless container deployed to AWS EC2:

Flow run logs — image by the author

Validate deployments

You can validate deployments from the Prefect UI:

Deployment page in the Prefect UI — image by the author

The image above shows that there are two kinds of deployments set up as part of this demo:

  1. The deployments with the name dataflowops-local are configured to run directly in the agent’s local process, i.e. in the same container as your Prefect agent.
  2. In contrast, deployments with the name dataflowops are configured to run as independent ECS tasks.

The second option brings a higher latency (serverless container must be provisioned first) but might work better at scale. As you reach a certain number of workflows, you could hit the limits of running everything in the same container. The ECSTask infrastructure block allows you to run each flow run in its own serverless container without occupying the resources of the agent process (running in its own ECS task).

Inspect and, optionally, modify block values

When visiting the blocks page, you can inspect all blocks that have been created as part of this demo. You should see S3, AWS Credentials, and ECS Task blocks. You can change the values such as CPU, memory, or environment variables directly from the UI without having to redeploy your code.

Modifying ECSTask block values from the Prefect UI — image by author

If you would like to create a new ECSTask block, you can do that:
a) from the UI, as shown in the image above,
b) from Python code, as shown in this code snippet.

To use the ECSTask block in your deployment, all you need to do is specify the ecs-task/block_name as your infrastructure block (-ib). Here is an example:

prefect deployment build flows/healthcheck.py:healthcheck -n prod -q prod -a -sb s3/prod -ib ecs-task/prod

You can do that not only from CI/CD but even from your local terminal. The Prefect Deployments FAQ on Discourse provides many helpful resources on how to work with various deployment use cases.

To delete all AWS resources created as part of this demo, run the following workflow. It will delete both CloudFormation stacks.

Deletion of CloudFormation resources incl. ECR repository and ECS cluster — image by author

This post covered how to get started with Prefect and ECS Fargate and how you can deploy the agent and flows as an automated GitHub Actions workflow. If you have questions about this setup, you can submit a GitHub issue directly on the dataflowops repository or ask a question via Prefect Discourse or Slack.

Thanks for reading!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment