Techno Blender
Digitally Yours.

Nearest Neighbors Regressors — A Visual Guide | by Angela Shi | Mar, 2023

0 41


Visual Understanding of the Models and the Impacts of Hyperparameters

K Nearest Neighbors or KNN is one of the most simple models in machine learning. In fact, to some extent, there is no model, because, for the prediction of a new observation, it will use the entirety of the training dataset to find the “nearest neighbors” according to a distance (usually the euclidean distance). And then in the case of a regression task, the prediction value is calculated by averaging the target variable values of these neighbors.

Since we use the notion of distance, only numerical features should be used. Well, you can always transform categorical using one-hot encoding or label encoding, the algorithm of distance calculation will work, but the distance is kind of meaningless.

It is worth also noting that the value of the target variable is not used to find the neighbors.

In this article, we will use some simple datasets to visualize how KNN Regressor works and how the hyperparameter k will impact the predictions. We also will discuss the impact of feature scaling. We will also explore a less known version of nearest neighbors which is radius nearest neighbors. In the end, we will finish with a discussion about the more customized versions of distance.

We will use a simple dataset with non-linear behavior since we know that KNN can handle it.

Nearest Neighbors Regressors dataset — image by author

For those who read my article about the Decision Tree Regressor visualization, you can notice that it is the same data. We will do a quick comparison with the Decision Tree Regressor model.

We can create and fit a KNeighborsRegressor model with KNeighborsRegressor(n_neighbors = 3), then we “fit” the model with model.fit(X, y).

In order to make the process of fitting identical to all models, you can notice that the model is “fitted” with the classic fit method. But for KNeighborsRegressor, the fitting process is just saving the dataset X and y, and nothing else. Yes, that is the most rapid fitting ever! and the model is also the biggest ever!

Now, we can test the “model” for one observation. In the following code, we will use one single point. The classic predict is to calculate the prediction, and the kneighbors method allows us to get the neighbors.

Then we can plot the neighbors. I will show you the plots for x = 10 and x = 20. Please feel free to do more tests.

Nearest Neighbors Regressors with kneighbors— image by author

Now, we also can use a sequence of x values to get all the predictions.

Here is the resulting plot from the previous code. For each point on the red segment, the y value represents the average value of the k nearest neighbors (here k = 3)

Nearest Neighbors Regressors with predictions — image by author

Let’s now create the model predictions for different values of k.

Nearest Neighbors Regressors with different values of k — image by author

We can also compare with the Decision Tree Regressor model

Nearest Neighbors Regressors vs. Decision Tree Regressors — image by author

We can notice that the frontier is always clean-cut for decision tree regressors whereas it is more nuanced for k nearest neighbors.

We will use the following dataset, with two continuous features, to create a KNN model. For the testing dataset, we will use meshgrid to generate a grid.

Then we can use plotly to create interactive 3D plots. In the image below, we can see the 3D plot with different values of k.

Nearest Neighbors Regressors with two features — image by author

Here again, we can compare them with Decision Tree Regressor model. We can SEE and FEEL the difference in the behavior of these two models.

Nearest Neighbors Regressors vs. Decision Tree Regressors — image by author

Contrary to Decision Trees, the scaling of the features has a direct impact on the model.

For example, we can do the following transformation for the two-feature case. It is worth noting that for one continuous feature, the scaling has no impact since the relative distance is not changed.

Nearest Neighbors Regressors with different feature scales — image by author

We can visually conclude that the two series of models are very different. We can calculate the usual model performance to compare them. But here my approach is really to demonstrate visually how the model behaves differently. Can you feel it? The distances are changed, because the scales of the features are changed. And in the end, the neighbors change.

Some may say that we should use standardization or min-max scaling. But you can see that in the image above one case or another could have been a dataset with standardization (or min-max scaling). And you can not say in advance if the standardization helps the model to have a better performance.

In fact, in order to take into account the relative importance of each feature in the distance calculation, we should give different weights for different features. But this would make the tuning process too complex.

Just imagine that in a linear regression model, the key is to find the coefficients for each feature. And in the distance calculation of k NN, all features are considered with the same importance. Intuitively, we can FEEL that this kNN model can not be that performant!

In the scikit-learn neighbors module, there is a lesser-known model called RadiusNeighborsRegressor and you can easily understand from its name that instead of taking the fixed number of neighbors (in the case of k nearest neighbors), we use a circle of a fixed radius around the new observation to find its neighbors.

Now, when Radius Neighbors model could be more interesting? Let’s take an example of a dataset with an outlier. We can see how the two models behave differently. Since this outlier is “far” away, in the case of knn, the number of neighbors is fixed, so the neighbors are also far away from each other. But for radius neighbors, the effect of the outlier is more important.

Nearest Neighbors Regressors KNN vs. Radius NN — image by author

Here please note that when I use the term “outlier”, it does not necessarily mean that we should get rid of this outlier. I just want to show the case of some data points are far away from others.

The impact is then also significant for non-outliers. Because we can not satisfy the two cases.

You may notice something strange in the image above when the radius is too small. Yes, in this case, for some points, there are no neighbors. Then, a huge negative number is assigned. The truth is, in this case, there is no solution. But I still think that it is an error that should be corrected in scikit-learn. Yielding an error is better than giving this value by default.

Before finishing with Radius Neighbors, when can it be really interesting? Imagine a case when you have lots of data in the area in a city, and in another area nearby, you don’t have much data, but you know that you could have collected more. Then radius neighbors can be more relevant.

Similarly, if you have a value for one area (that will be assigned to all addresses of this area), then we can use the neighbors to smooth the value. Here again, the radius neighbors would be more relevant. Here below you have an illustration of the value smoothing using a nearest neighbors model.

Geographical Neighbors Smoothing — image by author

There is one last interesting discussion around the notion of distance, you may already think about it when visualizing the previous cartography. The notion of distance can be very specific, because if you calculate the euclidean distance with longitude and latitude, then this distance may not correctly reflect the geographic neighborhood (which is the distance you may desire to use).

You may surely already know, but SEEING is always better. From the image below the red circle is the “true circle” for the central location, which is the red formed by the locations with an equal geographical distance from the central location. The blue “circle” is obtained by calculating a Euclidean distance of latitude and longitude. Around the equator area, the two circles are almost the same. But they are rather different when for locations far from the equator area. So next time you have latitude and longitude in your dataset, and you use Nearest Neighbors models, you have to think about it.

True circle on Earth vs. “Circle” with lat lon — image by author

Now, you can imagine that in other cases, a more customized distance may be necessary. So this simple model can become more performant.

It is also possible to weigh the neighbors. You can do it with the weights argument. And here is the description of this argument from the official documentation:

weights : {‘uniform’, ‘distance’}, callable or None, default=’uniform’

Weight function used in prediction. Possible values:

– ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

– ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

– [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

However, the distance design can become so complex that it is rather easier to adopt another approach such as Decision Trees and Mathematical function-based models.

I am writing a series of similar articles to demonstrate how visualization helps us to gain a better understanding of how machine learning models work without maths. Please follow me with the link below and get full access to my articles: https://medium.com/@angela.shi/membership

So in this article, we demonstrated that Nearest Neighbors “models” are quite intuitive to understand with visualization for simple datasets, because the notion of neighbors is intuitive and straightforward.

The choice of the hyperparameter K or the radius has a significant impact on the performance of the Nearest neighbors models. If K (or radius) is too small, the model may overfit the noise in the data, while if K (or radius) is too large, the model may underfit and fail to capture the underlying patterns in the data.

Changing the scale of the data is also important when using Nearest Neighbors models, as the algorithm is sensitive to the scale of the input features.


Visual Understanding of the Models and the Impacts of Hyperparameters

K Nearest Neighbors or KNN is one of the most simple models in machine learning. In fact, to some extent, there is no model, because, for the prediction of a new observation, it will use the entirety of the training dataset to find the “nearest neighbors” according to a distance (usually the euclidean distance). And then in the case of a regression task, the prediction value is calculated by averaging the target variable values of these neighbors.

Since we use the notion of distance, only numerical features should be used. Well, you can always transform categorical using one-hot encoding or label encoding, the algorithm of distance calculation will work, but the distance is kind of meaningless.

It is worth also noting that the value of the target variable is not used to find the neighbors.

In this article, we will use some simple datasets to visualize how KNN Regressor works and how the hyperparameter k will impact the predictions. We also will discuss the impact of feature scaling. We will also explore a less known version of nearest neighbors which is radius nearest neighbors. In the end, we will finish with a discussion about the more customized versions of distance.

We will use a simple dataset with non-linear behavior since we know that KNN can handle it.

Nearest Neighbors Regressors dataset — image by author

For those who read my article about the Decision Tree Regressor visualization, you can notice that it is the same data. We will do a quick comparison with the Decision Tree Regressor model.

We can create and fit a KNeighborsRegressor model with KNeighborsRegressor(n_neighbors = 3), then we “fit” the model with model.fit(X, y).

In order to make the process of fitting identical to all models, you can notice that the model is “fitted” with the classic fit method. But for KNeighborsRegressor, the fitting process is just saving the dataset X and y, and nothing else. Yes, that is the most rapid fitting ever! and the model is also the biggest ever!

Now, we can test the “model” for one observation. In the following code, we will use one single point. The classic predict is to calculate the prediction, and the kneighbors method allows us to get the neighbors.

Then we can plot the neighbors. I will show you the plots for x = 10 and x = 20. Please feel free to do more tests.

Nearest Neighbors Regressors with kneighbors— image by author

Now, we also can use a sequence of x values to get all the predictions.

Here is the resulting plot from the previous code. For each point on the red segment, the y value represents the average value of the k nearest neighbors (here k = 3)

Nearest Neighbors Regressors with predictions — image by author

Let’s now create the model predictions for different values of k.

Nearest Neighbors Regressors with different values of k — image by author

We can also compare with the Decision Tree Regressor model

Nearest Neighbors Regressors vs. Decision Tree Regressors — image by author

We can notice that the frontier is always clean-cut for decision tree regressors whereas it is more nuanced for k nearest neighbors.

We will use the following dataset, with two continuous features, to create a KNN model. For the testing dataset, we will use meshgrid to generate a grid.

Then we can use plotly to create interactive 3D plots. In the image below, we can see the 3D plot with different values of k.

Nearest Neighbors Regressors with two features — image by author

Here again, we can compare them with Decision Tree Regressor model. We can SEE and FEEL the difference in the behavior of these two models.

Nearest Neighbors Regressors vs. Decision Tree Regressors — image by author

Contrary to Decision Trees, the scaling of the features has a direct impact on the model.

For example, we can do the following transformation for the two-feature case. It is worth noting that for one continuous feature, the scaling has no impact since the relative distance is not changed.

Nearest Neighbors Regressors with different feature scales — image by author

We can visually conclude that the two series of models are very different. We can calculate the usual model performance to compare them. But here my approach is really to demonstrate visually how the model behaves differently. Can you feel it? The distances are changed, because the scales of the features are changed. And in the end, the neighbors change.

Some may say that we should use standardization or min-max scaling. But you can see that in the image above one case or another could have been a dataset with standardization (or min-max scaling). And you can not say in advance if the standardization helps the model to have a better performance.

In fact, in order to take into account the relative importance of each feature in the distance calculation, we should give different weights for different features. But this would make the tuning process too complex.

Just imagine that in a linear regression model, the key is to find the coefficients for each feature. And in the distance calculation of k NN, all features are considered with the same importance. Intuitively, we can FEEL that this kNN model can not be that performant!

In the scikit-learn neighbors module, there is a lesser-known model called RadiusNeighborsRegressor and you can easily understand from its name that instead of taking the fixed number of neighbors (in the case of k nearest neighbors), we use a circle of a fixed radius around the new observation to find its neighbors.

Now, when Radius Neighbors model could be more interesting? Let’s take an example of a dataset with an outlier. We can see how the two models behave differently. Since this outlier is “far” away, in the case of knn, the number of neighbors is fixed, so the neighbors are also far away from each other. But for radius neighbors, the effect of the outlier is more important.

Nearest Neighbors Regressors KNN vs. Radius NN — image by author

Here please note that when I use the term “outlier”, it does not necessarily mean that we should get rid of this outlier. I just want to show the case of some data points are far away from others.

The impact is then also significant for non-outliers. Because we can not satisfy the two cases.

You may notice something strange in the image above when the radius is too small. Yes, in this case, for some points, there are no neighbors. Then, a huge negative number is assigned. The truth is, in this case, there is no solution. But I still think that it is an error that should be corrected in scikit-learn. Yielding an error is better than giving this value by default.

Before finishing with Radius Neighbors, when can it be really interesting? Imagine a case when you have lots of data in the area in a city, and in another area nearby, you don’t have much data, but you know that you could have collected more. Then radius neighbors can be more relevant.

Similarly, if you have a value for one area (that will be assigned to all addresses of this area), then we can use the neighbors to smooth the value. Here again, the radius neighbors would be more relevant. Here below you have an illustration of the value smoothing using a nearest neighbors model.

Geographical Neighbors Smoothing — image by author

There is one last interesting discussion around the notion of distance, you may already think about it when visualizing the previous cartography. The notion of distance can be very specific, because if you calculate the euclidean distance with longitude and latitude, then this distance may not correctly reflect the geographic neighborhood (which is the distance you may desire to use).

You may surely already know, but SEEING is always better. From the image below the red circle is the “true circle” for the central location, which is the red formed by the locations with an equal geographical distance from the central location. The blue “circle” is obtained by calculating a Euclidean distance of latitude and longitude. Around the equator area, the two circles are almost the same. But they are rather different when for locations far from the equator area. So next time you have latitude and longitude in your dataset, and you use Nearest Neighbors models, you have to think about it.

True circle on Earth vs. “Circle” with lat lon — image by author

Now, you can imagine that in other cases, a more customized distance may be necessary. So this simple model can become more performant.

It is also possible to weigh the neighbors. You can do it with the weights argument. And here is the description of this argument from the official documentation:

weights : {‘uniform’, ‘distance’}, callable or None, default=’uniform’

Weight function used in prediction. Possible values:

– ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

– ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

– [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

However, the distance design can become so complex that it is rather easier to adopt another approach such as Decision Trees and Mathematical function-based models.

I am writing a series of similar articles to demonstrate how visualization helps us to gain a better understanding of how machine learning models work without maths. Please follow me with the link below and get full access to my articles: https://medium.com/@angela.shi/membership

So in this article, we demonstrated that Nearest Neighbors “models” are quite intuitive to understand with visualization for simple datasets, because the notion of neighbors is intuitive and straightforward.

The choice of the hyperparameter K or the radius has a significant impact on the performance of the Nearest neighbors models. If K (or radius) is too small, the model may overfit the noise in the data, while if K (or radius) is too large, the model may underfit and fail to capture the underlying patterns in the data.

Changing the scale of the data is also important when using Nearest Neighbors models, as the algorithm is sensitive to the scale of the input features.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment