How to Properly Deploy ML Models as Flask APIs on Amazon ECS | by Nikola Kuzmic | Mar, 2023

By Jessie Hobb On Mar 16, 2023

Deploy XGBoost models on Amazon ECS to recommend perfect puppies

With the wild success of ChatGPT it is becoming apparent just how much AI technology will impact our lives. However, unless those amazing ML models are made available for everyone to use and deployed properly to address high user demand, they will fail to create any positive impact on the world. Hence, why it is so important to be able to not only develop AI solutions, but also know how to deploy them properly. Not to mention that this skillset will make you vastly more valuable on the marketplace and open career opportunities to take on the lucrative roles of ML Engineering.

In this post we’re going to deploy an XGBoost model as a Flask API using the Gunicorn application server on Amazon Elastic Container Service. The model will recommend a Dachshund or a German Shepherd puppy based on how big someone’s home is.

👉 Game Plan

Train an XGBoost model
Build a simple Gunicorn-Flask API to make recommendations
Build a Docker Image for the API
Deploy the Docker Container on Amazon ECS

Entire Source Code Github Repo: link🧑‍💻

flask-on-ecs - repo structure
.
├── Dockerfile
├── README.md
├── myapp.py
├── requirements.txt
└── train_xgb.ipynb

Deploying ML Models on Cloud

It is often the case that we need to deploy our locally trained ML models to production for everyone on the internet to use. This approach requires first wrapping the ML model into an API and then Dockerizing it. AWS has built a specialized tool called Elastic Container Service (ECS) which removes the headache of managing compute environments like EC2s and enables us to simply deploy our Docker Containers using a serverless tool called Fargate.

Note about Web vs Application Servers

In the traditional world of web development, it is common practice to have a Web Server, such as NGINX, handle enormous traffic from clients and interact with the backend applications (APIs) which serve dynamic content. A Web Server can be thought of as a waiter in a restaurant where he/she receives and processes orders from customers. Similarly, a web server receives and processes requests from web clients (such as web browsers). The waiter then communicates with the kitchen to have the order prepared and delivers the finished meal to the customer. In the same way, a web server communicates with the backend application to process the request and sends the response back to the web client. The setup would typically be as follows:

Web Servers are awesome as they are able to distribute client requests to multiple backend applications and improve performance, security and scalability. However, in our case, since we’ll be deploying the Flask API on AWS, there are cloud-native Load Balancers which can handle traffic routing to the backend APIs and also enable us to enforce SSL encryption. Hence, including NGINX would be somewhat redundant. Having only a Gunicorn application server is sufficient for the majority of ML model deployment cases, given you intend to use AWS Load Balancers. Ok but…

What is WSGI & Gunicorn?

WSGI (Web Server Gateway Interface) is simply a convention or a set of rules that need to be used when a web server communicates with a web application. Gunicorn (Green Unicorn) is one such Application Server which follows WSGI rules and handles client request when sending them to the Python Flask applications. Flask itself provides a WSGI Werkzeug’s development server for initial development, but if we want to deploy the API in production, then likely our Flask application needs to handle multiple requests at a time, hence why we need Gunicorn.

Typical Cloud Architecture for Real-time APIs

Running APIs in cloud is greatly enhanced by the Application Load Balancers (ALBs) as they can typically serve the purpose of NGINX and route traffic to our backend applications. This tutorial will only focus on deploying the Flask API on ECS and we can cover ALBs in a future post.

Alright, enough of background, let’s build & deploy some APIs!

👉 Step 1: Train an XGBoost model

Train an XGBoost model to predict either a Dachshund (Wiener Dog) or a German Shepherd based on house area and save the model as a pickle file.

To run it inside VS Code, let’s create a separate Python 3.8 environment:

conda create --name py38demo python=3.8 conda activate py38demo
pip install ipykernel pandas flask gunicorn numpy xgboost scikit-learn

Then Restart VS Code and in Jupyter Notebook -> Select ‘py38demo‘ as the Kernel.

Train & pickle the XGBoost model:

As you can see, we were able to train the model, tested it on a 300 & 600 ft2 homes, and saved the XGBoost model as a pickle (.pkl) file.

👉 Step 2: Build a simple Gunicorn-Flask API

Let’s build a very simple Flask API which serves our XGBoost model predictions. We have a simple helper function which translates 0/1 model predictions into ‘wiener dog’/‘german shepherd’ outputs:

To run the API, in terminal:

python myapp.py

In a separate terminal, test it out by sending a POST request:

curl -X POST http://0.0.0.0:80/recms -H 'Content-Type: application/json' -d '{"area":"350"}'

How I ran it locally on my Mac:

Our API is working great but you can see we get a warning that this is a Development Server:

Let’s stop our API, and use Gunicorn production-grade server instead:

gunicorn --workers 4 --bind 0.0.0.0:80 myapp:flask_app_obj

Going back to our VS Code:

Now we’re ready to Dockerize the API! 📦

👉 3. Build a Docker Image for the API

Below is a Dockerfile which uses a python3.8 base image. We need the 3.8 version since we used that version locally to train our XGBoost model.

Note: since I am building the image on a Mac, I need to specify

– -platform linux/amd64

for it to be compatible with the ECS Fargate Linux environment.

Here’s how we build & run the image.

Note: we bind our host (i.e. laptop’s) port 80 to docker container’s port 80:

docker build --platform linux/amd64 -t flaskapi .
docker run -dp 80:80 flaskapi

Let’s quickly test it again:

Now that we know our API is working inside a Docker Container, it’s time to push it to AWS! 🌤️

👉 Step 4: Deploy the Docker Container on Amazon ECS

This section may look complicated at first, but actually is quite simple if we break the process into 6 simple steps.

i) Push the Docker image to ECR

Let’s create an ECR repo called demo where we can push the Docker image.

Then we can use the Push Commands provided by the ECR:

# autheticate
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com#tag the image
docker tag <Your-local-docker-image-name>:latest <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest
#push the image to ECR
docker push <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest

Assumption: you have configured AWS CLI on your local machine and setup an IAM user with the right permission to interact with the ECR. You can find more info at this link.

After running the above 3 commands, we can see our image is there on ECR 🎉

Copy & Paste the Image URI somewhere as we’ll need it in the next couple of steps.

ii) Create an IAM Execution Role

We need to create an Execution Role so that our ECS task which will run the container has the access to pull images from the ECR. We’ll name it: simpleRole

iii) Create a Security Group

Security Group is needed to allow anyone on the internet to send requests to our API. In the real world you may want to constrain this to a specific set of IPs but we’ll open it for everyone and call it: simpleSG

iv) Create an ECS Cluster

This step is straightforward and only takes couple seconds. We’ll call it: flaskCluster

while our cluster is being provisioned, let’s create a Task Definition.

v) Create a Task Definition

Task Definition, as the name implies is a set of instructions related to which image to run, port to open, and how much virtual CPU and memory we want to allocate. We’ll call it: demoTask

vi) Run the Task

Let’s run our demoTask on flaskCluster, with the simpleSG from step iii).

Time to test out our deployed API from the Public IP address! 🥁

curl -X POST http://<PUBLIC-IP>:80/recms -H 'Content-Type: application/json' -d '{"area":"200"}'

It’s working! 🥳

As you can see we are able to get a perfect puppy recommendation by sending a POST request to the Public IP provided by ECS. 🔥

Thanks for reading, hope you found this useful for getting started with Flask, Gunicorn, Docker and ECS!