Techno Blender
Digitally Yours.

Nearest-neighbor missing visuals revealed | by Pranay Dave | May, 2022

0 85


How to analyze and interpret KNN results with cutting-edge visuals

KNN Visuals (image by author)

The unsupervised K- Nearest Neighbour (KNN) algorithm is perhaps the most straightforward machine learning algorithm. However, a simple algorithm does not mean that analyzing the results is equally simple. As per my research, there are not many documented approaches to analyzing the results of the KNN algorithm. In this article, I will show you how to analyze and understand the results of the unsupervised KNN algorithm.

I will be using a dataset on cars. A sample dataset is shown here. The data has got make of the car, different technical characteristics such as fuel type, length, width, number of doors, etc. as well as the price of the car. The data has about 25 fields, out of which there are about 15 numeric fields.

Cars sample data (image by author).

The data is split into two — training and scoring. The training dataset is used to train the KNN model. Then the model is used to find the nearest neighbors for data in the scoring dataset.

Here are the visuals which will help understand the KNN results.

One of the elegant ways to visualize nearest neighbors is using a network diagram. The data in the scoring dataset is a central node and is linked to its nearest neighbors.

network diagram (image by author)

In addition, one can add a hovering tooltip to see details behind the nodes. This gives a good understanding of the nearest neighbors for a particular record in the score data.

hover tooltip (image by author)

As Network diagrams are based on graph analytics, you can also analyze how neighbors are connected to each other. This helps in finding a community of neighbors as well as isolated neighbors.

Graph analytics on nearest neighbor output (image by author)

Combining the nearest neighbor algorithm with graph analytics is a powerful tool to understand overall results.

In real life, one can have good neighbors or bad neighbors! Similarly, KNN can identify the nearest neighbor, however, it does not mean that the nearest neighbors are always similar or compatible.

We can verify this “neighborhood compatibility” using PCA and spot-lighting technique as shown below. We use PCA for all train and scoring data to reduce data to two dimensions. The reduced dimensional data is plotted with a scatter plot. We can then use a spotlight technique to highlight the nearest neighbors for a particular record in the scoring dataset. For more information on spotlighting technique, please see my article here

Nearest neighbor analysis for score record 2 (image by author)

Shown above are all nearest neighbors for scoring record 2. You will observe that all the points are relatively close to each other. This means that the nearest neighbors are relatively compatible with each other as they have more or less identical features. Further, inspection shows that most of the cars are Nissan, which justifies our observation.

Now let us do the same analysis for scoring record 6 as shown below.

Nearest neighbor analysis for score record 6 (image by author)

You will observe that neighbors are situated relatively far from each other. This means that the nearest neighbors are not very compatible with each other. Observing the cars behind the dots, we can see that it’s a mix of Audi, Volvo, and Volkswagon. So even if the dots are classified as nearest neighbors, the cars are different from each other.

In summary

  • The network diagram and graph analytics are an excellent way to visualize the results of the KNN unsupervised algorithm
  • Using PCA and spotlight technique, you can analyze the compatibility of the nearest neighbors

You can visit my website to make KNN analyses as well as other analytics with no coding : https://experiencedatascience.com

Here is a step-by-step tutorial and demo on my Youtube channel. You will be able to customize the demo to your data with zero coding.

Youtube video link (image by author)

Please subscribe in order to stay informed whenever I release a new story.

You can also join Medium with my referral link. Thank you.

Datasource citation

The data is from https://archive.ics.uci.edu/ml/datasets/automobile.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


How to analyze and interpret KNN results with cutting-edge visuals

KNN Visuals (image by author)

The unsupervised K- Nearest Neighbour (KNN) algorithm is perhaps the most straightforward machine learning algorithm. However, a simple algorithm does not mean that analyzing the results is equally simple. As per my research, there are not many documented approaches to analyzing the results of the KNN algorithm. In this article, I will show you how to analyze and understand the results of the unsupervised KNN algorithm.

I will be using a dataset on cars. A sample dataset is shown here. The data has got make of the car, different technical characteristics such as fuel type, length, width, number of doors, etc. as well as the price of the car. The data has about 25 fields, out of which there are about 15 numeric fields.

Cars sample data (image by author).

The data is split into two — training and scoring. The training dataset is used to train the KNN model. Then the model is used to find the nearest neighbors for data in the scoring dataset.

Here are the visuals which will help understand the KNN results.

One of the elegant ways to visualize nearest neighbors is using a network diagram. The data in the scoring dataset is a central node and is linked to its nearest neighbors.

network diagram (image by author)

In addition, one can add a hovering tooltip to see details behind the nodes. This gives a good understanding of the nearest neighbors for a particular record in the score data.

hover tooltip (image by author)

As Network diagrams are based on graph analytics, you can also analyze how neighbors are connected to each other. This helps in finding a community of neighbors as well as isolated neighbors.

Graph analytics on nearest neighbor output (image by author)

Combining the nearest neighbor algorithm with graph analytics is a powerful tool to understand overall results.

In real life, one can have good neighbors or bad neighbors! Similarly, KNN can identify the nearest neighbor, however, it does not mean that the nearest neighbors are always similar or compatible.

We can verify this “neighborhood compatibility” using PCA and spot-lighting technique as shown below. We use PCA for all train and scoring data to reduce data to two dimensions. The reduced dimensional data is plotted with a scatter plot. We can then use a spotlight technique to highlight the nearest neighbors for a particular record in the scoring dataset. For more information on spotlighting technique, please see my article here

Nearest neighbor analysis for score record 2 (image by author)

Shown above are all nearest neighbors for scoring record 2. You will observe that all the points are relatively close to each other. This means that the nearest neighbors are relatively compatible with each other as they have more or less identical features. Further, inspection shows that most of the cars are Nissan, which justifies our observation.

Now let us do the same analysis for scoring record 6 as shown below.

Nearest neighbor analysis for score record 6 (image by author)

You will observe that neighbors are situated relatively far from each other. This means that the nearest neighbors are not very compatible with each other. Observing the cars behind the dots, we can see that it’s a mix of Audi, Volvo, and Volkswagon. So even if the dots are classified as nearest neighbors, the cars are different from each other.

In summary

  • The network diagram and graph analytics are an excellent way to visualize the results of the KNN unsupervised algorithm
  • Using PCA and spotlight technique, you can analyze the compatibility of the nearest neighbors

You can visit my website to make KNN analyses as well as other analytics with no coding : https://experiencedatascience.com

Here is a step-by-step tutorial and demo on my Youtube channel. You will be able to customize the demo to your data with zero coding.

Youtube video link (image by author)

Please subscribe in order to stay informed whenever I release a new story.

You can also join Medium with my referral link. Thank you.

Datasource citation

The data is from https://archive.ics.uci.edu/ml/datasets/automobile.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment