Nearest-neighbor missing visuals revealed | by Pranay Dave | May, 2022

By Jessie Hobb On May 23, 2022

How to analyze and interpret KNN results with cutting-edge visuals

KNN Visuals (image by author)

The unsupervised K- Nearest Neighbour (KNN) algorithm is perhaps the most straightforward machine learning algorithm. However, a simple algorithm does not mean that analyzing the results is equally simple. As per my research, there are not many documented approaches to analyzing the results of the KNN algorithm. In this article, I will show you how to analyze and understand the results of the unsupervised KNN algorithm.

I will be using a dataset on cars. A sample dataset is shown here. The data has got make of the car, different technical characteristics such as fuel type, length, width, number of doors, etc. as well as the price of the car. The data has about 25 fields, out of which there are about 15 numeric fields.

The data is split into two — training and scoring. The training dataset is used to train the KNN model. Then the model is used to find the nearest neighbors for data in the scoring dataset.

Here are the visuals which will help understand the KNN results.

One of the elegant ways to visualize nearest neighbors is using a network diagram. The data in the scoring dataset is a central node and is linked to its nearest neighbors.

In addition, one can add a hovering tooltip to see details behind the nodes. This gives a good understanding of the nearest neighbors for a particular record in the score data.

As Network diagrams are based on graph analytics, you can also analyze how neighbors are connected to each other. This helps in finding a community of neighbors as well as isolated neighbors.

Graph analytics on nearest neighbor output (image by author)

Combining the nearest neighbor algorithm with graph analytics is a powerful tool to understand overall results.

In real life, one can have good neighbors or bad neighbors! Similarly, KNN can identify the nearest neighbor, however, it does not mean that the nearest neighbors are always similar or compatible.

We can verify this “neighborhood compatibility” using PCA and spot-lighting technique as shown below. We use PCA for all train and scoring data to reduce data to two dimensions. The reduced dimensional data is plotted with a scatter plot. We can then use a spotlight technique to highlight the nearest neighbors for a particular record in the scoring dataset. For more information on spotlighting technique, please see my article here

Nearest neighbor analysis for score record 2 (image by author)

Shown above are all nearest neighbors for scoring record 2. You will observe that all the points are relatively close to each other. This means that the nearest neighbors are relatively compatible with each other as they have more or less identical features. Further, inspection shows that most of the cars are Nissan, which justifies our observation.

Now let us do the same analysis for scoring record 6 as shown below.

Nearest neighbor analysis for score record 6 (image by author)

You will observe that neighbors are situated relatively far from each other. This means that the nearest neighbors are not very compatible with each other. Observing the cars behind the dots, we can see that it’s a mix of Audi, Volvo, and Volkswagon. So even if the dots are classified as nearest neighbors, the cars are different from each other.

In summary

The network diagram and graph analytics are an excellent way to visualize the results of the KNN unsupervised algorithm
Using PCA and spotlight technique, you can analyze the compatibility of the nearest neighbors

You can visit my website to make KNN analyses as well as other analytics with no coding : https://experiencedatascience.com

Here is a step-by-step tutorial and demo on my Youtube channel. You will be able to customize the demo to your data with zero coding.

Youtube video link (image by author)

Please subscribe in order to stay informed whenever I release a new story.

You can also join Medium with my referral link. Thank you.

Datasource citation

The data is from https://archive.ics.uci.edu/ml/datasets/automobile.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

How to analyze and interpret KNN results with cutting-edge visuals