How To Stratify Data in Machine Learning Projects to Significantly Improve Model Performance | by Graham Harrison | Jun, 2022

By Jessie Hobb On Jun 3, 2022

How and when to stratify data in machine learning projects to ensure that predictions are accurate and meaningful using just 7 lines of Python code

Background

I recently worked on a real-world machine learning project which initially produced a set of predictions that were rejected by the domain experts because they could not accept that the future would out-turn in the way the model was predicting.

The causes of the problem revolved around the changing nature of the aspect of the business represented by the data over time i.e. the future was not going to out-turn like the past.

This can mean that a machine learning model may not be accurate enough to be meaningful but eventually a solution was found by stratifying the data which prompted me to write this article to share the solution so that it can be used by other data scientists who encounter a similar problem.

Getting Started

The first thing we need is some data. The real-world data used in the project cannot be shared so I have created a fictitious dataset from scratch using the Faker library. This means that there are no license restrictions on the data and it may be used or re-used for learning and development purposes.

Let’s start by importing the libraries we need and then reading in the fictitious data sets …

How and when to stratify data in machine learning projects to ensure that predictions are accurate and meaningful using just 7 lines of Python code

Background

The causes of the problem revolved around the changing nature of the aspect of the business represented by the data over time i.e. the future was not going to out-turn like the past.

Getting Started

Let’s start by importing the libraries we need and then reading in the fictitious data sets …

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

How To Stratify Data in Machine Learning Projects to Significantly Improve Model Performance | by Graham Harrison | Jun, 2022

How and when to stratify data in machine learning projects to ensure that predictions are accurate and meaningful using just 7 lines of Python code

Background

Getting Started

Understanding the Problem

Implementing the Solution (The First 5 Lines of Code)

Understanding the Solution

Conclusion

Thank you for reading!

How and when to stratify data in machine learning projects to ensure that predictions are accurate and meaningful using just 7 lines of Python code

Background

Getting Started

Understanding the Problem

Implementing the Solution (The First 5 Lines of Code)

Understanding the Solution

Conclusion

Thank you for reading!