Techno Blender
Digitally Yours.

How to Properly Deploy ML Models as Flask APIs on Amazon ECS | by Nikola Kuzmic | Mar, 2023

0 53


Photo by Carissa Weiser on Unsplash

With the wild success of ChatGPT it is becoming apparent just how much AI technology will impact our lives. However, unless those amazing ML models are made available for everyone to use and deployed properly to address high user demand, they will fail to create any positive impact on the world. Hence, why it is so important to be able to not only develop AI solutions, but also know how to deploy them properly. Not to mention that this skillset will make you vastly more valuable on the marketplace and open career opportunities to take on the lucrative roles of ML Engineering.

In this post we’re going to deploy an XGBoost model as a Flask API using the Gunicorn application server on Amazon Elastic Container Service. The model will recommend a Dachshund or a German Shepherd puppy based on how big someone’s home is.

Image by Author, Sources: [1–2]

👉 Game Plan

  1. Train an XGBoost model
  2. Build a simple Gunicorn-Flask API to make recommendations
  3. Build a Docker Image for the API
  4. Deploy the Docker Container on Amazon ECS

Entire Source Code Github Repo: link🧑‍💻

flask-on-ecs - repo structure
.
├── Dockerfile
├── README.md
├── myapp.py
├── requirements.txt
└── train_xgb.ipynb

Deploying ML Models on Cloud

It is often the case that we need to deploy our locally trained ML models to production for everyone on the internet to use. This approach requires first wrapping the ML model into an API and then Dockerizing it. AWS has built a specialized tool called Elastic Container Service (ECS) which removes the headache of managing compute environments like EC2s and enables us to simply deploy our Docker Containers using a serverless tool called Fargate.

Note about Web vs Application Servers

In the traditional world of web development, it is common practice to have a Web Server, such as NGINX, handle enormous traffic from clients and interact with the backend applications (APIs) which serve dynamic content. A Web Server can be thought of as a waiter in a restaurant where he/she receives and processes orders from customers. Similarly, a web server receives and processes requests from web clients (such as web browsers). The waiter then communicates with the kitchen to have the order prepared and delivers the finished meal to the customer. In the same way, a web server communicates with the backend application to process the request and sends the response back to the web client. The setup would typically be as follows:

Image by Author

Web Servers are awesome as they are able to distribute client requests to multiple backend applications and improve performance, security and scalability. However, in our case, since we’ll be deploying the Flask API on AWS, there are cloud-native Load Balancers which can handle traffic routing to the backend APIs and also enable us to enforce SSL encryption. Hence, including NGINX would be somewhat redundant. Having only a Gunicorn application server is sufficient for the majority of ML model deployment cases, given you intend to use AWS Load Balancers. Ok but…

What is WSGI & Gunicorn?

WSGI (Web Server Gateway Interface) is simply a convention or a set of rules that need to be used when a web server communicates with a web application. Gunicorn (Green Unicorn) is one such Application Server which follows WSGI rules and handles client request when sending them to the Python Flask applications. Flask itself provides a WSGI Werkzeug’s development server for initial development, but if we want to deploy the API in production, then likely our Flask application needs to handle multiple requests at a time, hence why we need Gunicorn.

Typical Cloud Architecture for Real-time APIs

Running APIs in cloud is greatly enhanced by the Application Load Balancers (ALBs) as they can typically serve the purpose of NGINX and route traffic to our backend applications. This tutorial will only focus on deploying the Flask API on ECS and we can cover ALBs in a future post.

Image by Author

Alright, enough of background, let’s build & deploy some APIs!

👉 Step 1: Train an XGBoost model

Train an XGBoost model to predict either a Dachshund (Wiener Dog) or a German Shepherd based on house area and save the model as a pickle file.

To run it inside VS Code, let’s create a separate Python 3.8 environment:

conda create --name py38demo python=3.8 

conda activate py38demo

pip install ipykernel pandas flask gunicorn numpy xgboost scikit-learn

Then Restart VS Code and in Jupyter Notebook -> Select ‘py38demo‘ as the Kernel.

Train & pickle the XGBoost model:

As you can see, we were able to train the model, tested it on a 300 & 600 ft2 homes, and saved the XGBoost model as a pickle (.pkl) file.

👉 Step 2: Build a simple Gunicorn-Flask API

Let’s build a very simple Flask API which serves our XGBoost model predictions. We have a simple helper function which translates 0/1 model predictions into ‘wiener dog’/‘german shepherd’ outputs:

To run the API, in terminal:

python myapp.py

In a separate terminal, test it out by sending a POST request:

curl -X POST http://0.0.0.0:80/recms -H 'Content-Type: application/json' -d '{"area":"350"}'

How I ran it locally on my Mac:

Our API is working great but you can see we get a warning that this is a Development Server:

Let’s stop our API, and use Gunicorn production-grade server instead:

gunicorn --workers 4 --bind 0.0.0.0:80 myapp:flask_app_obj

Going back to our VS Code:

Now we’re ready to Dockerize the API! 📦

👉 3. Build a Docker Image for the API

Below is a Dockerfile which uses a python3.8 base image. We need the 3.8 version since we used that version locally to train our XGBoost model.

Note: since I am building the image on a Mac, I need to specify

– -platform linux/amd64

for it to be compatible with the ECS Fargate Linux environment.

Here’s how we build & run the image.

Note: we bind our host (i.e. laptop’s) port 80 to docker container’s port 80:

docker build --platform linux/amd64 -t flaskapi .
docker run -dp 80:80 flaskapi

Let’s quickly test it again:

Now that we know our API is working inside a Docker Container, it’s time to push it to AWS! 🌤️

👉 Step 4: Deploy the Docker Container on Amazon ECS

This section may look complicated at first, but actually is quite simple if we break the process into 6 simple steps.

Image by Author

i) Push the Docker image to ECR

Let’s create an ECR repo called demo where we can push the Docker image.

Then we can use the Push Commands provided by the ECR:

# autheticate
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com

#tag the image
docker tag <Your-local-docker-image-name>:latest <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest
#push the image to ECR
docker push <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest

Assumption: you have configured AWS CLI on your local machine and setup an IAM user with the right permission to interact with the ECR. You can find more info at this link.

After running the above 3 commands, we can see our image is there on ECR 🎉

Copy & Paste the Image URI somewhere as we’ll need it in the next couple of steps.

ii) Create an IAM Execution Role

We need to create an Execution Role so that our ECS task which will run the container has the access to pull images from the ECR. We’ll name it: simpleRole

iii) Create a Security Group

Security Group is needed to allow anyone on the internet to send requests to our API. In the real world you may want to constrain this to a specific set of IPs but we’ll open it for everyone and call it: simpleSG

iv) Create an ECS Cluster

This step is straightforward and only takes couple seconds. We’ll call it: flaskCluster

while our cluster is being provisioned, let’s create a Task Definition.

v) Create a Task Definition

Task Definition, as the name implies is a set of instructions related to which image to run, port to open, and how much virtual CPU and memory we want to allocate. We’ll call it: demoTask

vi) Run the Task

Let’s run our demoTask on flaskCluster, with the simpleSG from step iii).

Time to test out our deployed API from the Public IP address! 🥁

curl -X POST http://<PUBLIC-IP>:80/recms -H 'Content-Type: application/json' -d '{"area":"200"}'

It’s working! 🥳

As you can see we are able to get a perfect puppy recommendation by sending a POST request to the Public IP provided by ECS. 🔥

Thanks for reading, hope you found this useful for getting started with Flask, Gunicorn, Docker and ECS!


Photo by Carissa Weiser on Unsplash

With the wild success of ChatGPT it is becoming apparent just how much AI technology will impact our lives. However, unless those amazing ML models are made available for everyone to use and deployed properly to address high user demand, they will fail to create any positive impact on the world. Hence, why it is so important to be able to not only develop AI solutions, but also know how to deploy them properly. Not to mention that this skillset will make you vastly more valuable on the marketplace and open career opportunities to take on the lucrative roles of ML Engineering.

In this post we’re going to deploy an XGBoost model as a Flask API using the Gunicorn application server on Amazon Elastic Container Service. The model will recommend a Dachshund or a German Shepherd puppy based on how big someone’s home is.

Image by Author, Sources: [1–2]

👉 Game Plan

  1. Train an XGBoost model
  2. Build a simple Gunicorn-Flask API to make recommendations
  3. Build a Docker Image for the API
  4. Deploy the Docker Container on Amazon ECS

Entire Source Code Github Repo: link🧑‍💻

flask-on-ecs - repo structure
.
├── Dockerfile
├── README.md
├── myapp.py
├── requirements.txt
└── train_xgb.ipynb

Deploying ML Models on Cloud

It is often the case that we need to deploy our locally trained ML models to production for everyone on the internet to use. This approach requires first wrapping the ML model into an API and then Dockerizing it. AWS has built a specialized tool called Elastic Container Service (ECS) which removes the headache of managing compute environments like EC2s and enables us to simply deploy our Docker Containers using a serverless tool called Fargate.

Note about Web vs Application Servers

In the traditional world of web development, it is common practice to have a Web Server, such as NGINX, handle enormous traffic from clients and interact with the backend applications (APIs) which serve dynamic content. A Web Server can be thought of as a waiter in a restaurant where he/she receives and processes orders from customers. Similarly, a web server receives and processes requests from web clients (such as web browsers). The waiter then communicates with the kitchen to have the order prepared and delivers the finished meal to the customer. In the same way, a web server communicates with the backend application to process the request and sends the response back to the web client. The setup would typically be as follows:

Image by Author

Web Servers are awesome as they are able to distribute client requests to multiple backend applications and improve performance, security and scalability. However, in our case, since we’ll be deploying the Flask API on AWS, there are cloud-native Load Balancers which can handle traffic routing to the backend APIs and also enable us to enforce SSL encryption. Hence, including NGINX would be somewhat redundant. Having only a Gunicorn application server is sufficient for the majority of ML model deployment cases, given you intend to use AWS Load Balancers. Ok but…

What is WSGI & Gunicorn?

WSGI (Web Server Gateway Interface) is simply a convention or a set of rules that need to be used when a web server communicates with a web application. Gunicorn (Green Unicorn) is one such Application Server which follows WSGI rules and handles client request when sending them to the Python Flask applications. Flask itself provides a WSGI Werkzeug’s development server for initial development, but if we want to deploy the API in production, then likely our Flask application needs to handle multiple requests at a time, hence why we need Gunicorn.

Typical Cloud Architecture for Real-time APIs

Running APIs in cloud is greatly enhanced by the Application Load Balancers (ALBs) as they can typically serve the purpose of NGINX and route traffic to our backend applications. This tutorial will only focus on deploying the Flask API on ECS and we can cover ALBs in a future post.

Image by Author

Alright, enough of background, let’s build & deploy some APIs!

👉 Step 1: Train an XGBoost model

Train an XGBoost model to predict either a Dachshund (Wiener Dog) or a German Shepherd based on house area and save the model as a pickle file.

To run it inside VS Code, let’s create a separate Python 3.8 environment:

conda create --name py38demo python=3.8 

conda activate py38demo

pip install ipykernel pandas flask gunicorn numpy xgboost scikit-learn

Then Restart VS Code and in Jupyter Notebook -> Select ‘py38demo‘ as the Kernel.

Train & pickle the XGBoost model:

As you can see, we were able to train the model, tested it on a 300 & 600 ft2 homes, and saved the XGBoost model as a pickle (.pkl) file.

👉 Step 2: Build a simple Gunicorn-Flask API

Let’s build a very simple Flask API which serves our XGBoost model predictions. We have a simple helper function which translates 0/1 model predictions into ‘wiener dog’/‘german shepherd’ outputs:

To run the API, in terminal:

python myapp.py

In a separate terminal, test it out by sending a POST request:

curl -X POST http://0.0.0.0:80/recms -H 'Content-Type: application/json' -d '{"area":"350"}'

How I ran it locally on my Mac:

Our API is working great but you can see we get a warning that this is a Development Server:

Let’s stop our API, and use Gunicorn production-grade server instead:

gunicorn --workers 4 --bind 0.0.0.0:80 myapp:flask_app_obj

Going back to our VS Code:

Now we’re ready to Dockerize the API! 📦

👉 3. Build a Docker Image for the API

Below is a Dockerfile which uses a python3.8 base image. We need the 3.8 version since we used that version locally to train our XGBoost model.

Note: since I am building the image on a Mac, I need to specify

– -platform linux/amd64

for it to be compatible with the ECS Fargate Linux environment.

Here’s how we build & run the image.

Note: we bind our host (i.e. laptop’s) port 80 to docker container’s port 80:

docker build --platform linux/amd64 -t flaskapi .
docker run -dp 80:80 flaskapi

Let’s quickly test it again:

Now that we know our API is working inside a Docker Container, it’s time to push it to AWS! 🌤️

👉 Step 4: Deploy the Docker Container on Amazon ECS

This section may look complicated at first, but actually is quite simple if we break the process into 6 simple steps.

Image by Author

i) Push the Docker image to ECR

Let’s create an ECR repo called demo where we can push the Docker image.

Then we can use the Push Commands provided by the ECR:

# autheticate
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com

#tag the image
docker tag <Your-local-docker-image-name>:latest <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest
#push the image to ECR
docker push <Your-aws-acc-no>.dkr.ecr.us-east-1.amazonaws.com/<Your-ECR-repo-name>:latest

Assumption: you have configured AWS CLI on your local machine and setup an IAM user with the right permission to interact with the ECR. You can find more info at this link.

After running the above 3 commands, we can see our image is there on ECR 🎉

Copy & Paste the Image URI somewhere as we’ll need it in the next couple of steps.

ii) Create an IAM Execution Role

We need to create an Execution Role so that our ECS task which will run the container has the access to pull images from the ECR. We’ll name it: simpleRole

iii) Create a Security Group

Security Group is needed to allow anyone on the internet to send requests to our API. In the real world you may want to constrain this to a specific set of IPs but we’ll open it for everyone and call it: simpleSG

iv) Create an ECS Cluster

This step is straightforward and only takes couple seconds. We’ll call it: flaskCluster

while our cluster is being provisioned, let’s create a Task Definition.

v) Create a Task Definition

Task Definition, as the name implies is a set of instructions related to which image to run, port to open, and how much virtual CPU and memory we want to allocate. We’ll call it: demoTask

vi) Run the Task

Let’s run our demoTask on flaskCluster, with the simpleSG from step iii).

Time to test out our deployed API from the Public IP address! 🥁

curl -X POST http://<PUBLIC-IP>:80/recms -H 'Content-Type: application/json' -d '{"area":"200"}'

It’s working! 🥳

As you can see we are able to get a perfect puppy recommendation by sending a POST request to the Public IP provided by ECS. 🔥

Thanks for reading, hope you found this useful for getting started with Flask, Gunicorn, Docker and ECS!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment