Linear Regression in Data Science | by Mike Wolfe | May, 2022

By Jessie Hobb On Jun 2, 2022

A mathematical technique to Machine Learning

With graduation season just around the corner, I’ve had a few family members ask how often they would use the information they learned over the years. One cousin, in particular, was not a huge fan of math. However, he built his gaming PC and otherwise enjoys learning about computer hardware. At one point it was mentioned that computers will do all the necessary math, so why memorize formulas? While I saw his perspective to a degree, math is a building block to learning more about computers especially, and being able to verify results never hurt as a programmer. That train thought brought me back to Algebra 1, where I remember learning about graphs and modeling methods such as Linear Regression. I never thought then that I would be using it well after college. But my interests in Data Science have brought me full circle with mathematics as far as Predictive Analysis and related data. So today, I decided to take a look back at Linear Regression and how it’s used in Data Science.

If you recall back to math class, Regression is used in statistics to describe models. It is also used to describe and estimate the relationships between variables. The point of Linear Regression is to find and draw the best-fitting line which represents the relationship between two or more variables of interest. This line can help you find a correlation between data and to make predictions.

Finding similar trends from data and making predictions about future data is what Data Science uses for both Machine Learning and Statistical Modeling. With a little bit of a recap, let’s get back to the Data Science conversation.

While we’re talking about Linear Regression, it’s a good time to mention the Machine Learning technique it uses. The main two types of Machine Learning Methods are Supervised or Unsupervised Learning.

With Supervised Learning, a model is trained using labeled output data. This helps the model by guiding its mapping towards the input variables matching output labels the way you want. An example is a Regression or Classification.

In Unsupervised Learning, the model is not given any labeled data, so it has to find the patterns and structure within the data to decide how to map it. An example is Clustering or Association.

When we’re thinking of Linear Regression, we’re thinking of a Supervised Learning technique. This helps the model to be trained quickly and also easy to understand. Now that we briefly talked about the Learning type, let’s get back to how Linear Regression is used in Data Science.

When looking at Regression types, there are several different methods you could take. You can use Simple Regression or Multiple Regression. In both, you can choose to use linear or Non-Linear regression. Typically, Linear Regression is easier to use but also easier to interpret. Because of a clear Linear line, not only is the model easier to understand, but also easier to predict.

Linear Regression does try to “fit” data into a model. That model has a Linear shape. This also controls data to a certain degree of error, which is the data points that do not follow the Linear model. Because of that controlling of errors, you may not always have data that can perfectly stay within the line to a degree. If that’s the case, this is when you should opt to use Non-Linear Regression instead. There are a few assumptions your data would need to meet to work well with Linear Regression.

Linear Regression Assumptions

The errors (or residuals) of the best-fit Linear line follow the Normal Distribution
The data has no significant outliers
Observations should be independent of each other, meaning no dependencies
The variables should be measured at a continuous level
Check for homoscedasticity, which is a statistical concept where the variances from the best-fit line should remain similar all through that line
When reviewing the data, use a scatter plot to quickly view whether there is a linear relationship between variables

Just as a brief note of different programs and environments you can use to perform Linear Regression:

MATLAB Linear Regression
Linear Regression Python
R Linear Regression
Sklearn Linear Regression
Excel Linear Regression

To cover all bases, and because sometimes I think it’s easier to learn this way, let’s also look at a of couple use cases.

One example is risk analysis for an insurance company. In this case, Linear Regression could be used to help predict or estimate the cost of claims. This could help the business to make important decisions on what risks to take.

The obvious examples refer to education, money, and other avenues such as years of experience or age. However, an example I thought was interesting deals with sports analysis. Data points you could compare are the number of games won, with the average number of points your team scores. You could reverse that as well, and determine how many games are won compared to the number of points your opponent scores. That way, once you find that correlation, you can predict the outcome of games, or how many might be won in a season.

Today we talked about Linear Regression and how it is used in Data Science. First, we recapped what Linear Regression is. Next, we talked about the Machine Learning Method that Linear Regression uses, and also why Linear Regression is used. After, we looked at how to determine if Linear Regression is appropriate for modeling, including the assumptions your data must match. Briefly, we also checked on a few programs and environments you can use to perform Linear Regression. Finally, we looked at a couple of use cases.

Even though learning Linear Regression was a mathematical concept from many years ago, Machine Learning brings the technique back to help a machine predict trends and future data. Hopefully, Linear Regression used in Data Science makes a little more sense, and you found this interesting to read. Until next time, cheers!

Read all my articles for free with my weekly newsletter, thanks!

Want to read all articles on Medium? Become a Medium member today!