Techno Blender
Digitally Yours.

An End to End Web App to detect anomalies from ECG signals with Streamlit | by Eugenia Anello | Nov, 2022

0 47


This tutorial focus on building a web application with MLflow, Sagemaker and Streamlit

Photo by Michael Fenton on Unsplash

This article is in continuation of the story How to deploy your ML model using DagsHub+MLflow+AWS Lambda. In the previous story, I showed how to train and deploy a model to detect irregular heart rhythms from ECG signals. The benchmark dataset was the ECG5000 dataset, which contains 5000 heartbeats randomly selected from a patient that had heart failure. It’s been frequently used in research papers and tutorials.

This time I am going to use another real-world dataset, which is much noisier and, consequently, more challenging. The data contains fields, like timestamp, patient id, and heart rate. In addition to these features, there is also a label that tells you if the heart rhythm is anomalous or not, and was obtained by manual annotation.

In this tutorial, we are going to build a web application that detects heart anomalies. Before creating the app, we are going to do some exploratory analysis, build and compare different machine learning models. Let’s get started!

Illustration by Author.

We are going to focus again on detecting anomalies from ECG signals. Above there are the ECG signals of a patient, where the x markers represent the anomalies. From this example, you can guess that it’s not only anomaly detection, but it’s also peak detection. So, we need to identify peaks and establish if these peaks are anomalous.

To respect the anomaly detection formulation, the training set contains only normal ECG signals, while the test contains both normal and anomalous signals. There is also an important consideration to make: the anomalies constitute the minority class and correspond to peaks. Indeed, the test set contains less than 1% of anomalies

For these reasons, the heart rate alone is not enough to solve this problem. In addition to the heart rate, we need to create two new features. First, we build a variable that calculates the difference between the current value of the heart rate and the previous value. Another crucial feature is the peak label, which takes a value equal to 1 if there is a peak, otherwise, it returns 0.

Part 1: Model training and MLflow tracking

Like in the previous tutorial, we are going to use an awesome open-source platform to track the experiments, package machine learning models, and deploy them in production, called MLflow. It’s used together with DagsHub, which allows you to find all the resulting experiments on your repository and version your data and code efficiently. In addition to these features, you can visualize an interactive graph of the entire pipeline on DasgHub repository.

This time, I considered two models to detect heart anomalies: Autoencoder and Isolation Forest. This choice is due to the fact that the task is very challenging and the anomalies constitute a very small number compared to the normal observations. The Isolation Forest demonstrated to have better performances than Autoencoder for its characteristic assumptions: the anomalies represent the minority class and have short average path lengths on the isolation trees.

Moreover, it set up the value of a hyperparameter that corresponds to the proportion of anomalies in the dataset, also known as contamination, before training the algorithm. Isolation Forest achieved the best performances with contamination rates smaller than 1%, like 0.4% and 0.5%. It shouldn’t be a surprise that the autoencoder finds this task more problematic since it doesn’t have these types of assumptions.

In the script, we train one of the two available models and log hyperparameters and metrics. We are also interested in logging the model as an artifact and registering the model after it’s trained. These two operations can be merged using mlflow.sklearn.log_model(sk_model=,artifact_path,registered_model_name). If you still don’t want to register the model, you can avoid specifying registered_model_name parameter.

You can find the full code of train.py here.

We can run the code and try different combinations of hyperparameters, such as model_name (if or Autoencoder), contamination rate in case we are using the Isolation Forest, number of epochs, and batch size if we switch to Autoencoder.

python src/train.py

We can get access to the results of the experiments in the box of the experiments, which is present in the DagsHub repository. From the results obtained in the evaluation of the model on the test set, you can notice that the Autoencoder obtains very small precision and f1-score values, while recall is high and equal to 88%. It means that the number of false positives is elevated and, then, the autoencoder finds anomalous even normal patterns.

Differently from the autoencoder, the Isolation Forest reached better scores, 63% of precision and 85% of recall, leading to an f1-score equal to 73%. Even if there are still some false positives, the results exceeded expectations by considering that this problem is challenging.

To have a better comprehension of the evaluation measures obtained, let’s also visualize the true values versus the predictions. This is the plot achieved with Autoencoder on the ECG signals of a patient:

This is compared to the same plot obtained with Isolation Forest:

The green points represent the predicted label, while the red crosses represent the ground truth. It seems clear that the autoencoder seems to find anomalous a lot of observations, while the Isolation Forest takes into account the most anomalous samples.

Part 2: Deploy MLflow model with Amazon SageMaker

Since Isolation Forest achieved the best performance, we are going to focus only on this algorithm. It can be split into two steps:

  • Update Stage of MLflow Model
  • Create an AWS account and Set up an IAM Role
  • Deploy the MLflow model to a Sagemaker Endpoint

1. Update Stage of MLflow Model

From the DagsHub page, it’s possible to access the MLflow server user interface. We just have to click the Remote button at the top right and select “Go to MLflow UI”. After we can press the Model option from the menu, leading to the Registered Models page. Then, we click the latest version of your registered model and set the Stage parameter from None to Staging.

Instead of doing this operation manually, you can also use directly a python script:

Now, we updated the stage of the model and we are ready to switch to the next step!

2. Create an AWS Account and Set up an IAM role

Before deploying the model, there are some requirements that need to be respected:

  • Register an account in AWS
  • Go to IAM → Users → Add Users. Choose the name of the user and select Access Key — Programmatic Access as AWS Credential Type.
  • In case you don’t have a SageMaker group, click Create Group. Choose the name of the group and add two policies: AmazonSageMakerFullAccess and AmazonEC2ContainerRegistryFullAccess.
  • After we can go to Roles → Create Role. Select AWS service and SageMaker. The Sagemaker policy will be automatically attached to the the role.
Screenshot by Author
  • Set up the AWS CLI interface. Run aws configure on the terminal, that will ask you the AWS Access Key ID, AWS Secret Access Key, Default region name, and Default output format. For additional information, check here. This is an example shown in the AWS documentation:
$ aws configure 
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

I want to highlight that these configured credentials are essential for the next step.

3. Deploy MLflow Model to a Sagemaker Endpoint

The first step consists of building the mlflow-pyfunc image and push it to SageMaker using the MLflow CLI:

mlflow sagemaker build-and-push-container

You can find the resulting images both on your AWS account and on Docker Desktop.

After we can use a python script to deploy the MLflow model to Sagemaker using mlflow.sagemaker.deploy. You can find additional information in the mlflow documentation.

This will result in creating a Sagemaker endpoint.

You should obtain this final output. In case you obtained errors, check if you specified well the image_url, model_uri, and arn.

We can fastly check if the endpoint is working. The ECG signals of a patient are selected and passed to the deployed model, which will classify the observations as normal or anomalous.

It’s important to note that the Isolation Forest returns -1 when an observation is anomalous, otherwise 1. To compare it with the ground truth, we need to map the values, 1 for anomalous and 0 for normal. If you run the script, you should obtain an output like this:

Part 3: Create a Web Application

Let’s finally build a web application to detect anomalies from ECG signals! We are going to use Streamlit, a free open-source framework that allows building applications with few lines of code using Python:

  • We set up the link to the API service obtained with AWS Lambda, in which the model has been deployed.
  • The main title of the web app is shown using st.markdown.
  • A left panel sidebar is created using st.sidebar while uploading a CSV file to evaluate the performance of the model.
  • The button “Check Anomalies!” needs to be clicked to make appear the results of the deployed model.

If the file is uploaded and the button is selected, a scatterplot representing the ECG signals of a patient will appear. Like before, it compares the true values versus the predictions. In addition to these features, you can also change the value of the patient ID through the following code

After you push all changes in GitHub, we can deploy the app using Streamlit. It’s very easy and straightforward. If you want more info, click the link to this YouTube video. The link to my deployed app is here.

Final thought:

Congratulations! You reached the end of this project focused on detecting anomalies from ECG signals. It can be overwhelming at first when switching from one tool to another, but it gives satisfaction when you reach results! In particular, a web application can be a cute and intuitive way to share your work with other people. Thanks for reading. Have a nice day!

Check out the code in my DagsHub repository:


This tutorial focus on building a web application with MLflow, Sagemaker and Streamlit

Photo by Michael Fenton on Unsplash

This article is in continuation of the story How to deploy your ML model using DagsHub+MLflow+AWS Lambda. In the previous story, I showed how to train and deploy a model to detect irregular heart rhythms from ECG signals. The benchmark dataset was the ECG5000 dataset, which contains 5000 heartbeats randomly selected from a patient that had heart failure. It’s been frequently used in research papers and tutorials.

This time I am going to use another real-world dataset, which is much noisier and, consequently, more challenging. The data contains fields, like timestamp, patient id, and heart rate. In addition to these features, there is also a label that tells you if the heart rhythm is anomalous or not, and was obtained by manual annotation.

In this tutorial, we are going to build a web application that detects heart anomalies. Before creating the app, we are going to do some exploratory analysis, build and compare different machine learning models. Let’s get started!

Illustration by Author.

We are going to focus again on detecting anomalies from ECG signals. Above there are the ECG signals of a patient, where the x markers represent the anomalies. From this example, you can guess that it’s not only anomaly detection, but it’s also peak detection. So, we need to identify peaks and establish if these peaks are anomalous.

To respect the anomaly detection formulation, the training set contains only normal ECG signals, while the test contains both normal and anomalous signals. There is also an important consideration to make: the anomalies constitute the minority class and correspond to peaks. Indeed, the test set contains less than 1% of anomalies

For these reasons, the heart rate alone is not enough to solve this problem. In addition to the heart rate, we need to create two new features. First, we build a variable that calculates the difference between the current value of the heart rate and the previous value. Another crucial feature is the peak label, which takes a value equal to 1 if there is a peak, otherwise, it returns 0.

Part 1: Model training and MLflow tracking

Like in the previous tutorial, we are going to use an awesome open-source platform to track the experiments, package machine learning models, and deploy them in production, called MLflow. It’s used together with DagsHub, which allows you to find all the resulting experiments on your repository and version your data and code efficiently. In addition to these features, you can visualize an interactive graph of the entire pipeline on DasgHub repository.

This time, I considered two models to detect heart anomalies: Autoencoder and Isolation Forest. This choice is due to the fact that the task is very challenging and the anomalies constitute a very small number compared to the normal observations. The Isolation Forest demonstrated to have better performances than Autoencoder for its characteristic assumptions: the anomalies represent the minority class and have short average path lengths on the isolation trees.

Moreover, it set up the value of a hyperparameter that corresponds to the proportion of anomalies in the dataset, also known as contamination, before training the algorithm. Isolation Forest achieved the best performances with contamination rates smaller than 1%, like 0.4% and 0.5%. It shouldn’t be a surprise that the autoencoder finds this task more problematic since it doesn’t have these types of assumptions.

In the script, we train one of the two available models and log hyperparameters and metrics. We are also interested in logging the model as an artifact and registering the model after it’s trained. These two operations can be merged using mlflow.sklearn.log_model(sk_model=,artifact_path,registered_model_name). If you still don’t want to register the model, you can avoid specifying registered_model_name parameter.

You can find the full code of train.py here.

We can run the code and try different combinations of hyperparameters, such as model_name (if or Autoencoder), contamination rate in case we are using the Isolation Forest, number of epochs, and batch size if we switch to Autoencoder.

python src/train.py

We can get access to the results of the experiments in the box of the experiments, which is present in the DagsHub repository. From the results obtained in the evaluation of the model on the test set, you can notice that the Autoencoder obtains very small precision and f1-score values, while recall is high and equal to 88%. It means that the number of false positives is elevated and, then, the autoencoder finds anomalous even normal patterns.

Differently from the autoencoder, the Isolation Forest reached better scores, 63% of precision and 85% of recall, leading to an f1-score equal to 73%. Even if there are still some false positives, the results exceeded expectations by considering that this problem is challenging.

To have a better comprehension of the evaluation measures obtained, let’s also visualize the true values versus the predictions. This is the plot achieved with Autoencoder on the ECG signals of a patient:

This is compared to the same plot obtained with Isolation Forest:

The green points represent the predicted label, while the red crosses represent the ground truth. It seems clear that the autoencoder seems to find anomalous a lot of observations, while the Isolation Forest takes into account the most anomalous samples.

Part 2: Deploy MLflow model with Amazon SageMaker

Since Isolation Forest achieved the best performance, we are going to focus only on this algorithm. It can be split into two steps:

  • Update Stage of MLflow Model
  • Create an AWS account and Set up an IAM Role
  • Deploy the MLflow model to a Sagemaker Endpoint

1. Update Stage of MLflow Model

From the DagsHub page, it’s possible to access the MLflow server user interface. We just have to click the Remote button at the top right and select “Go to MLflow UI”. After we can press the Model option from the menu, leading to the Registered Models page. Then, we click the latest version of your registered model and set the Stage parameter from None to Staging.

Instead of doing this operation manually, you can also use directly a python script:

Now, we updated the stage of the model and we are ready to switch to the next step!

2. Create an AWS Account and Set up an IAM role

Before deploying the model, there are some requirements that need to be respected:

  • Register an account in AWS
  • Go to IAM → Users → Add Users. Choose the name of the user and select Access Key — Programmatic Access as AWS Credential Type.
  • In case you don’t have a SageMaker group, click Create Group. Choose the name of the group and add two policies: AmazonSageMakerFullAccess and AmazonEC2ContainerRegistryFullAccess.
  • After we can go to Roles → Create Role. Select AWS service and SageMaker. The Sagemaker policy will be automatically attached to the the role.
Screenshot by Author
  • Set up the AWS CLI interface. Run aws configure on the terminal, that will ask you the AWS Access Key ID, AWS Secret Access Key, Default region name, and Default output format. For additional information, check here. This is an example shown in the AWS documentation:
$ aws configure 
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

I want to highlight that these configured credentials are essential for the next step.

3. Deploy MLflow Model to a Sagemaker Endpoint

The first step consists of building the mlflow-pyfunc image and push it to SageMaker using the MLflow CLI:

mlflow sagemaker build-and-push-container

You can find the resulting images both on your AWS account and on Docker Desktop.

After we can use a python script to deploy the MLflow model to Sagemaker using mlflow.sagemaker.deploy. You can find additional information in the mlflow documentation.

This will result in creating a Sagemaker endpoint.

You should obtain this final output. In case you obtained errors, check if you specified well the image_url, model_uri, and arn.

We can fastly check if the endpoint is working. The ECG signals of a patient are selected and passed to the deployed model, which will classify the observations as normal or anomalous.

It’s important to note that the Isolation Forest returns -1 when an observation is anomalous, otherwise 1. To compare it with the ground truth, we need to map the values, 1 for anomalous and 0 for normal. If you run the script, you should obtain an output like this:

Part 3: Create a Web Application

Let’s finally build a web application to detect anomalies from ECG signals! We are going to use Streamlit, a free open-source framework that allows building applications with few lines of code using Python:

  • We set up the link to the API service obtained with AWS Lambda, in which the model has been deployed.
  • The main title of the web app is shown using st.markdown.
  • A left panel sidebar is created using st.sidebar while uploading a CSV file to evaluate the performance of the model.
  • The button “Check Anomalies!” needs to be clicked to make appear the results of the deployed model.

If the file is uploaded and the button is selected, a scatterplot representing the ECG signals of a patient will appear. Like before, it compares the true values versus the predictions. In addition to these features, you can also change the value of the patient ID through the following code

After you push all changes in GitHub, we can deploy the app using Streamlit. It’s very easy and straightforward. If you want more info, click the link to this YouTube video. The link to my deployed app is here.

Final thought:

Congratulations! You reached the end of this project focused on detecting anomalies from ECG signals. It can be overwhelming at first when switching from one tool to another, but it gives satisfaction when you reach results! In particular, a web application can be a cute and intuitive way to share your work with other people. Thanks for reading. Have a nice day!

Check out the code in my DagsHub repository:

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment