The Hidden Linearity in Polynomial Regression | by Angela Shi | Mar, 2023
From two perspectives to gain a better understanding of the linear models
In this article, we will discuss this viewpoint:
Polynomial regression is a linear regression.
At first glance, this statement may seem absurd. Polynomial regression is known as non-linear regression, whereas linear regression is, well, linear, so how can these two models be considered the same?
Like many things in life, you can see the same thing in different ways, and two persons can come to apparently different conclusions. However, what we tend to neglect is that the answer is not always the most important part, but it is the way, the methodology, or the framework that you use to achieve the conclusion. I recently published an article about the debate of whether logistic regression is a regressor or a classifier. And I mention the two perspectives: statistics vs. machine learning. In this article, we will also explore these two different perspectives on the understanding of polynomial regression.
Oh yes, some may say that polynomial regression is just some theoretical model, not easy to put into practice since raising one feature to high powers is not something meaningful. With this article, you will discover that the principle is more used than you think, and the degree of the polynomial should not be the only hyperparameter.
Let’s first have a look at the definitions of linear regression and polynomial regression. Now, I know that you probably already know the definitions very well, but when you read through these definitions, try to find something unusual about the inputs for both models.
1.1 Defining Linear Regression
Linear regression typically refers to Ordinary Least Squares (OLS) regression, and it involves minimizing the sum of the squared differences between the predicted values of the output and its actual values.
The equation of linear regression can be expressed as y = w X + b. Since X is a matrix that represents multiple variables, we can develop the function in this way:
y = w1x1 + w2x2 + … + wpxp + b
where:
- y is the dependent variable or output
- x1, x2, …, xp are the independent variables or input
- b is the intercept
- w1, w2, …, wp are the coefficients for the dependent variables
1.2 Defining Polynomial Regression
Polynomial regression models the relationship between the independent variable x and dependent variable y as an nth-degree polynomial. The result is a curve that best fits the data points, rather than a straight line.
The equation for polynomial regression can be written as:
y = b0 + b1x + b2x² + … + bn*x^n
where:
- again, y is the dependent variable,
- x is the independent variable,
- b0, b1, b2, …, bn are the coefficients, and
- n is the degree of the polynomial.
So, polynomial regression allows for a nonlinear relationship between x and y to be modeled.
1.3 One Important Aspect: the Number of Features
Now, do you notice a difference in the previous definitions? I searched for these definitions on the internet with various sources such as Wikipedia and university courses, and this difference is always present.
- linear regression is usually defined using multiple variables
- polynomial regression typically has only one feature variable.
In my opinion, the main reason is that polynomial regression is usually seen through the statisticians’ perspective. That is why the fitting algorithm for polynomial regression, polyfit, from the package numpy only allows one feature. And what about scikit learn, we will see later.
Anyway, in today’s “machine learning” applications, taking only one feature for the model is just not enough for real-world applications.
However, it is of course possible to input multiple features for polynomial regression. It is then named multivariate polynomial regression, the features will each be raised to different powers and with interactions between different features.
If there are p features, the generalized equation for polynomial regression of degree n becomes (I will write the equation in Latex so that you can better read it):
You don’t see this often, maybe because it is complex, and not practical to use. However, you can see that this idea is very important since it is key to modeling more complex relationships between the variables. We will see later in the article how we can translate this idea in a more elegant way.
To simplify the explanation, we will use polynomial regression with only one feature. However, you can easily imagine that the analysis can be generalized to polynomial regression with multiple features.
So in this section, we will refer to the feature variable as x.
2.1 What the Linearity is Referred to
Is polynomial regression a linear model or not? The answer lies in the question of which features we consider this linearity is referred to.
- Polynomial regression is non-linear regarding the variable x.
- But polynomial regression is linear if we consider the variables x, x^2, x^3, etc. as feature variables. To simplify the abstraction, we can use the notion x, x_2, x_3, etc. for these features. And then we can fit a linear regression to these features. So in the end, it is a multivariate linear regression!
In order to clarify this consideration, we can consider polynomial regression with two separate steps:
- Feature engineering: in this step, we create the polynomial features independently of the model
- Linear regression: the proper model of linear regression will take the features above and find the coefficients for each feature
I find this way of breaking complex models down into small pieces very helpful to understand them. That is why in scikit learn, we don’t have an estimator called PolynomialRegression, but it is done in two steps
- PolynomialFeatures in the module preprocessing
- LinearRegression in the module linear_model … oh wait, do we have to use this estimator only? You will discover more in the following sections.
So, in short, there are two perspectives:
- Polynomial regression is ONE model, and it takes x as input. Then it is a non-linear model to x
- Polynomial regression is NOT a standalone model, but it is built by transforming x into polynomial features first, and then a linear regression is applied.
The latter will unlock some very interesting potentials as we will see later in this article. But first, let’s create a plot to better convince ourselves of this linearity. Because seeing is believing.
2.2 Visualizing Polynomial Regression: From Curves to Planes
Since polynomial regression is a linear regression, so we first have to know to visualize linear regressions. I wrote this article to visualize linear regression with different numbers and types of variables. Considering the following examples of visualization, how would you use one of them, to be adapted for the case of polynomial regression?
To visualize polynomial regression, we will only add a quadratic term to make a linear regression with two continuous variables (x, x²), so we will see a 3D plot. We will not able to visualize more variables but you can imagine.
Let’s consider this simple example of a set of data points (x, y) that follows the equation y = 2 + x — 0.5 * x², we usually will create the following plot to represent x and y:
The red line represents the model, and the blue dots the training dataset.
Now, let’s imagine that we first plot some values of x and x²:
Then we create a 3D plot by adding the values of y for each (x, x²). The linear regression model using x and x² as inputs is then a plane that also can be represented in the 3D plot as shown below.
And we can make a gif out the several images by changing the angle of view. If you want to get the code for the creation of this gif, among other useful codes, please support me on Ko-fi with the following link: https://ko-fi.com/s/4cc6555852.
3.1 Practical application of polynomial regression
Fellow data scientists, be honest, do you really build polynomial regression for real-world applications? When do you really encounter polynomial regression? Try to remember… yes, that’s right, when the teacher tries to explain overfitting for regression problems. Here is an example below.
From a statistician’s viewpoint, with numpy, we already said only one feature is allowed. So it is not realistic to use this polyfit to create real-world models.
3.2 Machine learning perspective with scaling and regularization
From a machine learning perspective, creating a polynomial regression is more than raising one feature to various powers.
First, for the overall creation of polynomial regression, we already mentioned that it is done in two steps: PolynomialFeatures with preprocessing and LinearRegression with the proper model part.
Now, we also have to mention some specific technics like scaling. In fact, when the features are raised to very high degrees, then the numbers become so big that scikit learn can’t handle them anymore. So, it is better to perform scaling, not for the theoretical part, but for the practical aspect.
I wrote an article to talk about this point: Polynomial Regression with Scikit learn: What You Should Know.
One crucial aspect of polynomial regression is the degree of the polynomial, usually considered as the hyperparameter of the model. However, we should not forget that the degree is not the only possible hyperparameter. In fact, it is easier to think when you do polynomial regression in two steps
- Polynomial features: we can customize the features, and even manually add some interactions between certain variables. Then we can scale the features or use other technics such as QuantileTransformer or KBinsDiscretizer.
- Models: then for the model part, we also can choose a model such as Ridge, Lasso, or even SVR, instead of Linear regression.
Here is an example to illustrate the impact of the hyperparameter alpha on the model with ridge regression.
Now, you may be happy that you just learned how to create some new and more effective models. But they are still not practical to use. Maybe some of you can see that it is what we already do, but with a different approach… yes, kernels!
3.3 Poly kernels and more
The usual form of polynomial regression is not practical to implement, but in theory, this is one effective way to create a non-linear model based on mathematical functions. The other one is creating neural networks.
One reason is that when creating polynomial features, the number of features can be huge. And fitting a linear regression can be time-consuming. That is why SVR or SVM comes into play since the hinge loss function allows to “drop” many data points by keeping only the said “support vectors”.
Yes, the idea of kernel function is often associated with SVM or SVR, but we should not forget that it is a separate theory and that is why we also have KernelRidge in scikit learn. And in theory, we also could have KernelLasso or KernelElasticNet. KernelLogisticRegression is another example for classification tasks.
In conclusion, we have explored the model of polynomial regression and its relationship with linear regression. While it is usually considered ONE model for one feature from a statistical perspective, it can be viewed as a form of linear regression preceded by a feature engineering part which consists namely of creating polynomial features but they should be scaled. Moreover, we can also apply other models such as ridge, lasso or SVM to these polynomial features which will improve their performance.
Finally, polynomial regression serves also as a prime example for the feature mapping technics in the case of mathematical function-based models. And this leads to kernel functions that allow the modeling of non-linear relationships between the features and the target variable.
From two perspectives to gain a better understanding of the linear models
In this article, we will discuss this viewpoint:
Polynomial regression is a linear regression.
At first glance, this statement may seem absurd. Polynomial regression is known as non-linear regression, whereas linear regression is, well, linear, so how can these two models be considered the same?
Like many things in life, you can see the same thing in different ways, and two persons can come to apparently different conclusions. However, what we tend to neglect is that the answer is not always the most important part, but it is the way, the methodology, or the framework that you use to achieve the conclusion. I recently published an article about the debate of whether logistic regression is a regressor or a classifier. And I mention the two perspectives: statistics vs. machine learning. In this article, we will also explore these two different perspectives on the understanding of polynomial regression.
Oh yes, some may say that polynomial regression is just some theoretical model, not easy to put into practice since raising one feature to high powers is not something meaningful. With this article, you will discover that the principle is more used than you think, and the degree of the polynomial should not be the only hyperparameter.
Let’s first have a look at the definitions of linear regression and polynomial regression. Now, I know that you probably already know the definitions very well, but when you read through these definitions, try to find something unusual about the inputs for both models.
1.1 Defining Linear Regression
Linear regression typically refers to Ordinary Least Squares (OLS) regression, and it involves minimizing the sum of the squared differences between the predicted values of the output and its actual values.
The equation of linear regression can be expressed as y = w X + b. Since X is a matrix that represents multiple variables, we can develop the function in this way:
y = w1x1 + w2x2 + … + wpxp + b
where:
- y is the dependent variable or output
- x1, x2, …, xp are the independent variables or input
- b is the intercept
- w1, w2, …, wp are the coefficients for the dependent variables
1.2 Defining Polynomial Regression
Polynomial regression models the relationship between the independent variable x and dependent variable y as an nth-degree polynomial. The result is a curve that best fits the data points, rather than a straight line.
The equation for polynomial regression can be written as:
y = b0 + b1x + b2x² + … + bn*x^n
where:
- again, y is the dependent variable,
- x is the independent variable,
- b0, b1, b2, …, bn are the coefficients, and
- n is the degree of the polynomial.
So, polynomial regression allows for a nonlinear relationship between x and y to be modeled.
1.3 One Important Aspect: the Number of Features
Now, do you notice a difference in the previous definitions? I searched for these definitions on the internet with various sources such as Wikipedia and university courses, and this difference is always present.
- linear regression is usually defined using multiple variables
- polynomial regression typically has only one feature variable.
In my opinion, the main reason is that polynomial regression is usually seen through the statisticians’ perspective. That is why the fitting algorithm for polynomial regression, polyfit, from the package numpy only allows one feature. And what about scikit learn, we will see later.
Anyway, in today’s “machine learning” applications, taking only one feature for the model is just not enough for real-world applications.
However, it is of course possible to input multiple features for polynomial regression. It is then named multivariate polynomial regression, the features will each be raised to different powers and with interactions between different features.
If there are p features, the generalized equation for polynomial regression of degree n becomes (I will write the equation in Latex so that you can better read it):
You don’t see this often, maybe because it is complex, and not practical to use. However, you can see that this idea is very important since it is key to modeling more complex relationships between the variables. We will see later in the article how we can translate this idea in a more elegant way.
To simplify the explanation, we will use polynomial regression with only one feature. However, you can easily imagine that the analysis can be generalized to polynomial regression with multiple features.
So in this section, we will refer to the feature variable as x.
2.1 What the Linearity is Referred to
Is polynomial regression a linear model or not? The answer lies in the question of which features we consider this linearity is referred to.
- Polynomial regression is non-linear regarding the variable x.
- But polynomial regression is linear if we consider the variables x, x^2, x^3, etc. as feature variables. To simplify the abstraction, we can use the notion x, x_2, x_3, etc. for these features. And then we can fit a linear regression to these features. So in the end, it is a multivariate linear regression!
In order to clarify this consideration, we can consider polynomial regression with two separate steps:
- Feature engineering: in this step, we create the polynomial features independently of the model
- Linear regression: the proper model of linear regression will take the features above and find the coefficients for each feature
I find this way of breaking complex models down into small pieces very helpful to understand them. That is why in scikit learn, we don’t have an estimator called PolynomialRegression, but it is done in two steps
- PolynomialFeatures in the module preprocessing
- LinearRegression in the module linear_model … oh wait, do we have to use this estimator only? You will discover more in the following sections.
So, in short, there are two perspectives:
- Polynomial regression is ONE model, and it takes x as input. Then it is a non-linear model to x
- Polynomial regression is NOT a standalone model, but it is built by transforming x into polynomial features first, and then a linear regression is applied.
The latter will unlock some very interesting potentials as we will see later in this article. But first, let’s create a plot to better convince ourselves of this linearity. Because seeing is believing.
2.2 Visualizing Polynomial Regression: From Curves to Planes
Since polynomial regression is a linear regression, so we first have to know to visualize linear regressions. I wrote this article to visualize linear regression with different numbers and types of variables. Considering the following examples of visualization, how would you use one of them, to be adapted for the case of polynomial regression?
To visualize polynomial regression, we will only add a quadratic term to make a linear regression with two continuous variables (x, x²), so we will see a 3D plot. We will not able to visualize more variables but you can imagine.
Let’s consider this simple example of a set of data points (x, y) that follows the equation y = 2 + x — 0.5 * x², we usually will create the following plot to represent x and y:
The red line represents the model, and the blue dots the training dataset.
Now, let’s imagine that we first plot some values of x and x²:
Then we create a 3D plot by adding the values of y for each (x, x²). The linear regression model using x and x² as inputs is then a plane that also can be represented in the 3D plot as shown below.
And we can make a gif out the several images by changing the angle of view. If you want to get the code for the creation of this gif, among other useful codes, please support me on Ko-fi with the following link: https://ko-fi.com/s/4cc6555852.
3.1 Practical application of polynomial regression
Fellow data scientists, be honest, do you really build polynomial regression for real-world applications? When do you really encounter polynomial regression? Try to remember… yes, that’s right, when the teacher tries to explain overfitting for regression problems. Here is an example below.
From a statistician’s viewpoint, with numpy, we already said only one feature is allowed. So it is not realistic to use this polyfit to create real-world models.
3.2 Machine learning perspective with scaling and regularization
From a machine learning perspective, creating a polynomial regression is more than raising one feature to various powers.
First, for the overall creation of polynomial regression, we already mentioned that it is done in two steps: PolynomialFeatures with preprocessing and LinearRegression with the proper model part.
Now, we also have to mention some specific technics like scaling. In fact, when the features are raised to very high degrees, then the numbers become so big that scikit learn can’t handle them anymore. So, it is better to perform scaling, not for the theoretical part, but for the practical aspect.
I wrote an article to talk about this point: Polynomial Regression with Scikit learn: What You Should Know.
One crucial aspect of polynomial regression is the degree of the polynomial, usually considered as the hyperparameter of the model. However, we should not forget that the degree is not the only possible hyperparameter. In fact, it is easier to think when you do polynomial regression in two steps
- Polynomial features: we can customize the features, and even manually add some interactions between certain variables. Then we can scale the features or use other technics such as QuantileTransformer or KBinsDiscretizer.
- Models: then for the model part, we also can choose a model such as Ridge, Lasso, or even SVR, instead of Linear regression.
Here is an example to illustrate the impact of the hyperparameter alpha on the model with ridge regression.
Now, you may be happy that you just learned how to create some new and more effective models. But they are still not practical to use. Maybe some of you can see that it is what we already do, but with a different approach… yes, kernels!
3.3 Poly kernels and more
The usual form of polynomial regression is not practical to implement, but in theory, this is one effective way to create a non-linear model based on mathematical functions. The other one is creating neural networks.
One reason is that when creating polynomial features, the number of features can be huge. And fitting a linear regression can be time-consuming. That is why SVR or SVM comes into play since the hinge loss function allows to “drop” many data points by keeping only the said “support vectors”.
Yes, the idea of kernel function is often associated with SVM or SVR, but we should not forget that it is a separate theory and that is why we also have KernelRidge in scikit learn. And in theory, we also could have KernelLasso or KernelElasticNet. KernelLogisticRegression is another example for classification tasks.
In conclusion, we have explored the model of polynomial regression and its relationship with linear regression. While it is usually considered ONE model for one feature from a statistical perspective, it can be viewed as a form of linear regression preceded by a feature engineering part which consists namely of creating polynomial features but they should be scaled. Moreover, we can also apply other models such as ridge, lasso or SVM to these polynomial features which will improve their performance.
Finally, polynomial regression serves also as a prime example for the feature mapping technics in the case of mathematical function-based models. And this leads to kernel functions that allow the modeling of non-linear relationships between the features and the target variable.