6 Dimensionality Reduction Techniques | by Kevin Berlemont, PhD | May, 2022

By Jessie Hobb On May 24, 2022

How and when to use them

In the age of big data, data scientists are using datasets that possess more and more features. This leads to a very well know effect: the curse of dimensionality. When the number of features increases, after a certain point, the performance of the model will decrease. This is due to the fact that the density of the data points is going to decrease as the dimensionality increases (without adding any samples). One of the main consequences to this, is that it becomes extremally easy for a model to overfit.

To overcome the issue of overfitting, training time and storage due to the high dimensionality, a popular approach consists in applying a dimensionality reduction technique to the original dataset.

In this post I will describe six dimensionality reduction methods that you have to know when doing a data science project. I will show an application of the methods to the well-known MNIST dataset. Finally, I will compare and detail when to use which method in the last part.

This algorithm is one of the most well-known feature extraction algorithms. PCA is an unsupervised, linear transformation algorithm that produce the new features by determining the maximum variance of the data. The algorithm is as follows:

Construct the covariance matrix of the data
Apply the eigen decomposition to the matrix and sort the eigen values in decreasing order
Transform the data by projecting on a subset of the top-k eigen values

How and when to use them

To overcome the issue of overfitting, training time and storage due to the high dimensionality, a popular approach consists in applying a dimensionality reduction technique to the original dataset.

Construct the covariance matrix of the data
Apply the eigen decomposition to the matrix and sort the eigen values in decreasing order
Transform the data by projecting on a subset of the top-k eigen values

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.