Deploying SageMaker Endpoints With Terraform | by Ram Vegiraju | Mar, 2023


Infrastructure as Code With Terraform

Image from Unsplash by Krishna Pandey

Infrastructure as Code (IaC) is an essential concept to optimize and take your resources and infrastructure to production. IaC is an age old DevOps/Software practice and has a few key benefits: Resources are maintained centrally via code, which in turn optimizes the speed and collaboration required to take your architecture to production.

This software best practice like many other also applies to your Machine Learning tooling and infrastructure. For today’s article we’ll take a look at how we can utilize an IaC tool known as Terraform to deploy a pre-trained SKLearn model on a SageMaker Endpoint for inference. We will explore how we can create a reusable template that you can adjust as you have to update your resources/hardware. With Terraform we can move from having standalone notebooks and individual Python files scattered everywhere to capturing all our necessary resources in one template file.

Another option for Infrastructure as Code with SageMaker is CloudFormation. You can reference this article, if that’s a preferred tool for your use-case. Note that Terraform is Cloud Provider agnostic, it spans across different cloud providers, whereas CloudFormation is specifically for AWS services.

NOTE: For those of you new to AWS, make sure you make an account at the following link if you want to follow along. Make sure to also have the AWS CLI installed to work with the example. This article will also assume basic knowledge of Terraform, take a look at this guide if you need a starting guide and reference the following instructions for installation. The article also assumes an intermediate understanding of SageMaker Deployment, I would suggest following this article for understanding Deployment/Inference more in depth, we will be using the same model in this article and mapping it over to Terraform.

Setup

As stated earlier we won’t really be focusing on the theory of model training and building. We’re going to quickly train a sample SKLearn model on the built-in Boston Housing Dataset that the package provides.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
import joblib

#Load data
boston = datasets.load_boston()
df = pd.DataFrame(boston.data, columns = boston.feature_names)
df['MEDV'] = boston.target

#Split Model
X = df.drop(['MEDV'], axis = 1)
y = df['MEDV']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 42)

#Model Creation
lm = LinearRegression()
lm.fit(X_train,y_train)

with open('model.joblib', 'wb') as f:
joblib.dump(lm,f)

with open('model.joblib', 'rb') as f:
predictor = joblib.load(f)

print("Testing following input: ")
print(X_test[0:1])
sampInput = [[0.09178, 0.0, 4.05, 0.0, 0.51, 6.416, 84.1, 2.6463, 5.0, 296.0, 16.6, 395.5, 9.04]]
print(type(sampInput))
print(predictor.predict(sampInput))

Here we quickly validate that the local model performs inference as expected. The script also emits the serialized model artifact that we will provide to SageMaker for deployment. Next we create a custom inference script, that essentially serves as an entry point script for dealing with pre/post processing for SageMaker Endpoints.

import joblib
import os
import json

"""
Deserialize fitted model
"""
def model_fn(model_dir):
model = joblib.load(os.path.join(model_dir, "model.joblib"))
return model

"""
input_fn
request_body: The body of the request sent to the model.
request_content_type: (string) specifies the format/variable type of the request
"""
def input_fn(request_body, request_content_type):
if request_content_type == 'application/json':
request_body = json.loads(request_body)
inpVar = request_body['Input']
return inpVar
else:
raise ValueError("This model only supports application/json input")

"""
predict_fn
input_data: returned array from input_fn above
model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
return model.predict(input_data)

"""
output_fn
prediction: the returned value from predict_fn above
content_type: the content type the endpoint expects to be returned. Ex: JSON, string
"""

def output_fn(prediction, content_type):
res = int(prediction[0])
respJSON = {'Output': res}
return respJSON

Next we wrap up both the script and the model artifact into a tarball format that SageMaker is compliant with. We then upload this model tarball into an S3 Bucket, as that’s the main storage option for all artifacts that SageMaker works with.

import boto3
import json
import os
import joblib
import pickle
import tarfile
import sagemaker
from sagemaker.estimator import Estimator
import time
from time import gmtime, strftime
import subprocess

#Setup
client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")
boto_session = boto3.session.Session()
s3 = boto_session.resource('s3')
region = boto_session.region_name
print(region)
sagemaker_session = sagemaker.Session()
role = "Replace with your SageMaker IAM Role"

#Build tar file with model data + inference code
bashCommand = "tar -cvpzf model.tar.gz model.joblib inference.py"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

#Bucket for model artifacts
default_bucket = sagemaker_session.default_bucket()
print(default_bucket)

#Upload tar.gz to bucket
model_artifacts = f"s3://{default_bucket}/model.tar.gz"
response = s3.meta.client.upload_file('model.tar.gz', default_bucket, 'model.tar.gz')

Terraform Variables

Within our template file (.tf) we first want to define something known as a Terraform Variable. With Input Variables specifically you can pass in values similar to arguments for functions/methods you define. Any values that you don’t want to hardcode, but also give default values to you can specify in the format of a variable. The variables we’ll be defining for a Real-Time SageMaker Endpoint are listed below.

  • SageMaker IAM Role ARN: This is the Role associated with the SageMaker service, attach all policies necessary for actions you will take with the service. Note, you can also define and reference a Role within Terraform itself.
  • Container: The Deep Learning Container from AWS or your own custom container you have built to host your model.
  • Model Data: The pre-trained model artifacts that we uploaded to S3, this can also be the trained artifacts emitted from a SageMaker Training Job.
  • Instance Type: The hardware behind your real-time endpoint. You can also make the number of instances into a variable if you would like.

For each variable you can define: the type, the default value, and a description.

variable "sm-iam-role" {
type = string
default = "Add your SageMaker IAM Role ARN here"
description = "The IAM Role for SageMaker Endpoint Deployment"
}

variable "container-image" {
type = string
default = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"
description = "The container you are utilizing for your SageMaker Model"
}

variable "model-data" {
type = string
default = "s3://sagemaker-us-east-1-474422712127/model.tar.gz"
description = "The pre-trained model data/artifacts, replace this with your training job."
}

variable "instance-type" {
type = string
default = "ml.m5.xlarge"
description = "The instance behind the SageMaker Real-Time Endpoint"
}

While we don’t cover it fully in depth in this article, you can also define variables for different hosting options within SageMaker. For example, within Serverless Inference you can define Memory Size and Concurrency as two variables that you want to set.

variable "memory-size" {
type = number
default = 4096
description = "Memory size behind your Serverless Endpoint"
}

variable "concurrency" {
type = number
default = 2
description = "Concurrent requests for Serverless Endpoint"
}

Terraform Resources & Deployment

The most essential Terraform building block is a Resource. Within a Resource Block you essentially define an infrastructure object. For our use-case we specifically have three SageMaker building blocks: SageMaker Model, SageMaker Endpoint Configuration, and a SageMaker Endpoint. Each of these are linked in a chain and eventually help us create our desired endpoint.

We can follow the Terraform Documentation for a SageMaker Model to get started. First we define the resource itself which has two components: the terraform name for the resource and the following string is the name you define if you want to reference it later in the template. Another key part we notice here is how we can reference a variable value, using the Terraform key word var.

# SageMaker Model Object
resource "aws_sagemaker_model" "sagemaker_model" {
name = "sagemaker-model-sklearn"
execution_role_arn = var.sm-iam-role

Next for our SageMaker Model we define our container and model data that we defined earlier and reference those specific variables.

primary_container {
image = var.container-image
mode = "SingleModel"
model_data_url = var.model-data
environment = {
"SAGEMAKER_PROGRAM" = "inference.py"
"SAGEMAKER_SUBMIT_DIRECTORY" = var.model-data
}
}

Optionally within SageMaker you can also provide a tag that you define for the specific object.

tags = {
Name = "sagemaker-model-terraform"
}

We apply a similar format for our Endpoint Configuration, here we essentially define our hardware.

# Create SageMaker endpoint configuration
resource "aws_sagemaker_endpoint_configuration" "sagemaker_endpoint_configuration" {
name = "sagemaker-endpoint-configuration-sklearn"

production_variants {
initial_instance_count = 1
instance_type = var.instance-type
model_name = aws_sagemaker_model.sagemaker_model.name
variant_name = "AllTraffic"
}

tags = {
Name = "sagemaker-endpoint-configuration-terraform"
}
}

We then reference this object in our endpoint creation.

# Create SageMaker Real-Time Endpoint
resource "aws_sagemaker_endpoint" "sagemaker_endpoint" {
name = "sagemaker-endpoint-sklearn"
endpoint_config_name = aws_sagemaker_endpoint_configuration.sagemaker_endpoint_configuration.name

tags = {
Name = "sagemaker-endpoint-terraform"
}

}

Before we can deploy the template to provision our resources, make sure you have the AWS CLI configured with the following command.

aws configure

Here we can then initialize our terraform project, with the following command.

terraform init

For deployment, we can then run another Terraform CLI command.

terraform apply
Resource Creation (Screenshot by Author)

While the endpoint is creating you can also validate this with the SageMaker Console.

Endpoint Creation SM Console (Screenshot by Author)

Additional Resources & Conclusion

The entire code for the example can be found in the repository above. I hope this article was a good introduction to Terraform in general as well as usage with SageMaker Inference. Infrastructure as Code is an essential practice that cannot be ignored in the world of MLOps when scaling to production.


Infrastructure as Code With Terraform

Image from Unsplash by Krishna Pandey

Infrastructure as Code (IaC) is an essential concept to optimize and take your resources and infrastructure to production. IaC is an age old DevOps/Software practice and has a few key benefits: Resources are maintained centrally via code, which in turn optimizes the speed and collaboration required to take your architecture to production.

This software best practice like many other also applies to your Machine Learning tooling and infrastructure. For today’s article we’ll take a look at how we can utilize an IaC tool known as Terraform to deploy a pre-trained SKLearn model on a SageMaker Endpoint for inference. We will explore how we can create a reusable template that you can adjust as you have to update your resources/hardware. With Terraform we can move from having standalone notebooks and individual Python files scattered everywhere to capturing all our necessary resources in one template file.

Another option for Infrastructure as Code with SageMaker is CloudFormation. You can reference this article, if that’s a preferred tool for your use-case. Note that Terraform is Cloud Provider agnostic, it spans across different cloud providers, whereas CloudFormation is specifically for AWS services.

NOTE: For those of you new to AWS, make sure you make an account at the following link if you want to follow along. Make sure to also have the AWS CLI installed to work with the example. This article will also assume basic knowledge of Terraform, take a look at this guide if you need a starting guide and reference the following instructions for installation. The article also assumes an intermediate understanding of SageMaker Deployment, I would suggest following this article for understanding Deployment/Inference more in depth, we will be using the same model in this article and mapping it over to Terraform.

Setup

As stated earlier we won’t really be focusing on the theory of model training and building. We’re going to quickly train a sample SKLearn model on the built-in Boston Housing Dataset that the package provides.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
import joblib

#Load data
boston = datasets.load_boston()
df = pd.DataFrame(boston.data, columns = boston.feature_names)
df['MEDV'] = boston.target

#Split Model
X = df.drop(['MEDV'], axis = 1)
y = df['MEDV']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 42)

#Model Creation
lm = LinearRegression()
lm.fit(X_train,y_train)

with open('model.joblib', 'wb') as f:
joblib.dump(lm,f)

with open('model.joblib', 'rb') as f:
predictor = joblib.load(f)

print("Testing following input: ")
print(X_test[0:1])
sampInput = [[0.09178, 0.0, 4.05, 0.0, 0.51, 6.416, 84.1, 2.6463, 5.0, 296.0, 16.6, 395.5, 9.04]]
print(type(sampInput))
print(predictor.predict(sampInput))

Here we quickly validate that the local model performs inference as expected. The script also emits the serialized model artifact that we will provide to SageMaker for deployment. Next we create a custom inference script, that essentially serves as an entry point script for dealing with pre/post processing for SageMaker Endpoints.

import joblib
import os
import json

"""
Deserialize fitted model
"""
def model_fn(model_dir):
model = joblib.load(os.path.join(model_dir, "model.joblib"))
return model

"""
input_fn
request_body: The body of the request sent to the model.
request_content_type: (string) specifies the format/variable type of the request
"""
def input_fn(request_body, request_content_type):
if request_content_type == 'application/json':
request_body = json.loads(request_body)
inpVar = request_body['Input']
return inpVar
else:
raise ValueError("This model only supports application/json input")

"""
predict_fn
input_data: returned array from input_fn above
model (sklearn model) returned model loaded from model_fn above
"""
def predict_fn(input_data, model):
return model.predict(input_data)

"""
output_fn
prediction: the returned value from predict_fn above
content_type: the content type the endpoint expects to be returned. Ex: JSON, string
"""

def output_fn(prediction, content_type):
res = int(prediction[0])
respJSON = {'Output': res}
return respJSON

Next we wrap up both the script and the model artifact into a tarball format that SageMaker is compliant with. We then upload this model tarball into an S3 Bucket, as that’s the main storage option for all artifacts that SageMaker works with.

import boto3
import json
import os
import joblib
import pickle
import tarfile
import sagemaker
from sagemaker.estimator import Estimator
import time
from time import gmtime, strftime
import subprocess

#Setup
client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")
boto_session = boto3.session.Session()
s3 = boto_session.resource('s3')
region = boto_session.region_name
print(region)
sagemaker_session = sagemaker.Session()
role = "Replace with your SageMaker IAM Role"

#Build tar file with model data + inference code
bashCommand = "tar -cvpzf model.tar.gz model.joblib inference.py"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

#Bucket for model artifacts
default_bucket = sagemaker_session.default_bucket()
print(default_bucket)

#Upload tar.gz to bucket
model_artifacts = f"s3://{default_bucket}/model.tar.gz"
response = s3.meta.client.upload_file('model.tar.gz', default_bucket, 'model.tar.gz')

Terraform Variables

Within our template file (.tf) we first want to define something known as a Terraform Variable. With Input Variables specifically you can pass in values similar to arguments for functions/methods you define. Any values that you don’t want to hardcode, but also give default values to you can specify in the format of a variable. The variables we’ll be defining for a Real-Time SageMaker Endpoint are listed below.

  • SageMaker IAM Role ARN: This is the Role associated with the SageMaker service, attach all policies necessary for actions you will take with the service. Note, you can also define and reference a Role within Terraform itself.
  • Container: The Deep Learning Container from AWS or your own custom container you have built to host your model.
  • Model Data: The pre-trained model artifacts that we uploaded to S3, this can also be the trained artifacts emitted from a SageMaker Training Job.
  • Instance Type: The hardware behind your real-time endpoint. You can also make the number of instances into a variable if you would like.

For each variable you can define: the type, the default value, and a description.

variable "sm-iam-role" {
type = string
default = "Add your SageMaker IAM Role ARN here"
description = "The IAM Role for SageMaker Endpoint Deployment"
}

variable "container-image" {
type = string
default = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"
description = "The container you are utilizing for your SageMaker Model"
}

variable "model-data" {
type = string
default = "s3://sagemaker-us-east-1-474422712127/model.tar.gz"
description = "The pre-trained model data/artifacts, replace this with your training job."
}

variable "instance-type" {
type = string
default = "ml.m5.xlarge"
description = "The instance behind the SageMaker Real-Time Endpoint"
}

While we don’t cover it fully in depth in this article, you can also define variables for different hosting options within SageMaker. For example, within Serverless Inference you can define Memory Size and Concurrency as two variables that you want to set.

variable "memory-size" {
type = number
default = 4096
description = "Memory size behind your Serverless Endpoint"
}

variable "concurrency" {
type = number
default = 2
description = "Concurrent requests for Serverless Endpoint"
}

Terraform Resources & Deployment

The most essential Terraform building block is a Resource. Within a Resource Block you essentially define an infrastructure object. For our use-case we specifically have three SageMaker building blocks: SageMaker Model, SageMaker Endpoint Configuration, and a SageMaker Endpoint. Each of these are linked in a chain and eventually help us create our desired endpoint.

We can follow the Terraform Documentation for a SageMaker Model to get started. First we define the resource itself which has two components: the terraform name for the resource and the following string is the name you define if you want to reference it later in the template. Another key part we notice here is how we can reference a variable value, using the Terraform key word var.

# SageMaker Model Object
resource "aws_sagemaker_model" "sagemaker_model" {
name = "sagemaker-model-sklearn"
execution_role_arn = var.sm-iam-role

Next for our SageMaker Model we define our container and model data that we defined earlier and reference those specific variables.

primary_container {
image = var.container-image
mode = "SingleModel"
model_data_url = var.model-data
environment = {
"SAGEMAKER_PROGRAM" = "inference.py"
"SAGEMAKER_SUBMIT_DIRECTORY" = var.model-data
}
}

Optionally within SageMaker you can also provide a tag that you define for the specific object.

tags = {
Name = "sagemaker-model-terraform"
}

We apply a similar format for our Endpoint Configuration, here we essentially define our hardware.

# Create SageMaker endpoint configuration
resource "aws_sagemaker_endpoint_configuration" "sagemaker_endpoint_configuration" {
name = "sagemaker-endpoint-configuration-sklearn"

production_variants {
initial_instance_count = 1
instance_type = var.instance-type
model_name = aws_sagemaker_model.sagemaker_model.name
variant_name = "AllTraffic"
}

tags = {
Name = "sagemaker-endpoint-configuration-terraform"
}
}

We then reference this object in our endpoint creation.

# Create SageMaker Real-Time Endpoint
resource "aws_sagemaker_endpoint" "sagemaker_endpoint" {
name = "sagemaker-endpoint-sklearn"
endpoint_config_name = aws_sagemaker_endpoint_configuration.sagemaker_endpoint_configuration.name

tags = {
Name = "sagemaker-endpoint-terraform"
}

}

Before we can deploy the template to provision our resources, make sure you have the AWS CLI configured with the following command.

aws configure

Here we can then initialize our terraform project, with the following command.

terraform init

For deployment, we can then run another Terraform CLI command.

terraform apply
Resource Creation (Screenshot by Author)

While the endpoint is creating you can also validate this with the SageMaker Console.

Endpoint Creation SM Console (Screenshot by Author)

Additional Resources & Conclusion

The entire code for the example can be found in the repository above. I hope this article was a good introduction to Terraform in general as well as usage with SageMaker Inference. Infrastructure as Code is an essential practice that cannot be ignored in the world of MLOps when scaling to production.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
DeployingEndpointslatest newsMARRAMSageMakerTech NewsTechnologyTerraformVegiraju
Comments (0)
Add Comment