Eigen Intuitions: Understanding Eigenvectors and Eigenvalues | by Peter Barrett Bryan | Jun, 2022
An intuitive basis for understanding all things “eigen”
We often want to transform our data to reduce the number of features while preserving as much variance (i.e., the differences among our samples) as we can. Often, you’ll hear folks refer to principal component analysis (PCA) and singular value decomposition (SVD), but we can’t appreciate how these methods work without first understanding what eigenvectors and eigenvalues are.
“Eigenvector” is a pretty weird word. As with many weird words (think kindergarten), we can blame the Germans. The most useful translation I’ve heard comes from the Coursera course “Mathematics for Machine Learning Specialization”: “eigen” means “characteristic.”
An “eigenvector” is a vector that “characterizes” a linear transform.
Let’s take a look at a couple vectors under arbitrary linear transforms like translation, scaling, rotation, and shear. Most of our vectors are shifted around. Some, though, point in the same direction before and after a transform.
Let’s take a look at a simple horizontal scaling! We can achieve this with the linear transform matrix [[2, 0], [0, 1]].
If we plot three unit length vectors— one at 0°, one at 45°, and one at 90° — and visualize what happens after applying our transform matrix, we see that some vectors remain pointed in the same directions (0° and 90°) as before while others do not (45°). There’s something interesting about the two vectors that remain pointed in the same direction before and after. Under the linear transform, these vectors are just scaled by a scalar term. Our unit vector at 90° is unchanged, i.e., scaled by 1 while our unit vector at 0° is doubled. These vectors are our eigenvectors!
The eigenvectors of a linear transform are those vectors that remain pointed in the same directions. For these vectors, the effect of the transform matrix is just scalar multiplication. For each eigenvector, the eigenvalue is the scalar that the vector is scaled by under the transform.
Plotting a bunch of vectors and waiting for an animation to render isn’t a terribly efficient approach. Luckily, all we need to do is formalize the intuitions we’ve already built. Some of the the equations look a little intimidating, but they aren’t so bad once we understand where they come from.
While the mathematics extend to matrices of arbitrary dimensionality, we are going to stick to a 2×2 matrix for this demonstration.
Let’s consider a linear transform matrix A. As we saw, the eigenvectors for a matrix are the vectors that are just scaled. These scalars are eigenvalues, and we’ll call them λ. We’ll call our eigenvectors x.
All together now… the eigenvectors x are scale by our eigenvalues λ by our matrix A.
We can formalize this with the top equation in Figure 2.
Let’s move some terms around! Subtracting off λx and factoring out x gives us a nice zero-valued equality. To subtract a constant (λ) off of the matrix A, we need to multiply it by an identity matrix the same dimensionality of A.
Let’s solve for our eigenvectors and eigenvalues! We aren’t interested in the “trivial” solution to these equations where the x vector is zero-valued. Instead, we want to know when the term (A-λI) is equal to zero. We can achieve this by checking the determinant (Figure 3) is zero-valued!
Let’s expand our matrix A now, so that we can see each of the values in the matrix (Figure 4).
Piecing things together, we get equalities shown in Figure 5.
Using our definition of the determinant from Figure 3, we can substitute in our values.
Finally, multiplying out terms, we recover a form called the “characteristic polynomial.”
That’s as far as we can go in the abstract! Now, let’s apply this “characteristic polynomial” and solve for our eigenvalues (λ).
Plugging in the values from our matrix into our characteristic polynomial…
We get our eigenvalues (λ). Now, we can substitute our eigenvalues back in to solve for our eigenvectors.
This is an odd result… @λ = 2, [0, -x₂] = 0. Our x₁ seems to have disappeared. What does this mean?
It means that for λ = 2 as long as x₂ is zero, x₁ can equal anything.
- [5, 0], [1, 0], and [-3, 0] for instance
Similarly, for λ = 1 as long as x₁ is zero, x₂ can equal anything.
- [0, 2], [0, -1], [0, 8] for instance
We express this invariance by substituting in a placeholder variable t for the terms that can take on any value.
This is exactly what our visual intuitions showed us! All the horizontal vectors of our space are eigenvectors and they are scaled by the eigenvalue 2. All the vertical vectors of our space are eigenvectors and they are scaled by the eigenvalue 1.
I found a lot of value in plotting things for myself. If you want to try, check out the source below!
It is important to give credit where it is most definitely due! While the code in the article is mine, the package used for visualization (manim) is certainly not! The visualization library and the method of explanation are shamelessly stolen from 3blue1brown.
I mentioned it earlier in the article, but I love this Coursera series: Mathematics for Machine Learning Specialization!
An intuitive basis for understanding all things “eigen”
We often want to transform our data to reduce the number of features while preserving as much variance (i.e., the differences among our samples) as we can. Often, you’ll hear folks refer to principal component analysis (PCA) and singular value decomposition (SVD), but we can’t appreciate how these methods work without first understanding what eigenvectors and eigenvalues are.
“Eigenvector” is a pretty weird word. As with many weird words (think kindergarten), we can blame the Germans. The most useful translation I’ve heard comes from the Coursera course “Mathematics for Machine Learning Specialization”: “eigen” means “characteristic.”
An “eigenvector” is a vector that “characterizes” a linear transform.
Let’s take a look at a couple vectors under arbitrary linear transforms like translation, scaling, rotation, and shear. Most of our vectors are shifted around. Some, though, point in the same direction before and after a transform.
Let’s take a look at a simple horizontal scaling! We can achieve this with the linear transform matrix [[2, 0], [0, 1]].
If we plot three unit length vectors— one at 0°, one at 45°, and one at 90° — and visualize what happens after applying our transform matrix, we see that some vectors remain pointed in the same directions (0° and 90°) as before while others do not (45°). There’s something interesting about the two vectors that remain pointed in the same direction before and after. Under the linear transform, these vectors are just scaled by a scalar term. Our unit vector at 90° is unchanged, i.e., scaled by 1 while our unit vector at 0° is doubled. These vectors are our eigenvectors!
The eigenvectors of a linear transform are those vectors that remain pointed in the same directions. For these vectors, the effect of the transform matrix is just scalar multiplication. For each eigenvector, the eigenvalue is the scalar that the vector is scaled by under the transform.
Plotting a bunch of vectors and waiting for an animation to render isn’t a terribly efficient approach. Luckily, all we need to do is formalize the intuitions we’ve already built. Some of the the equations look a little intimidating, but they aren’t so bad once we understand where they come from.
While the mathematics extend to matrices of arbitrary dimensionality, we are going to stick to a 2×2 matrix for this demonstration.
Let’s consider a linear transform matrix A. As we saw, the eigenvectors for a matrix are the vectors that are just scaled. These scalars are eigenvalues, and we’ll call them λ. We’ll call our eigenvectors x.
All together now… the eigenvectors x are scale by our eigenvalues λ by our matrix A.
We can formalize this with the top equation in Figure 2.
Let’s move some terms around! Subtracting off λx and factoring out x gives us a nice zero-valued equality. To subtract a constant (λ) off of the matrix A, we need to multiply it by an identity matrix the same dimensionality of A.
Let’s solve for our eigenvectors and eigenvalues! We aren’t interested in the “trivial” solution to these equations where the x vector is zero-valued. Instead, we want to know when the term (A-λI) is equal to zero. We can achieve this by checking the determinant (Figure 3) is zero-valued!
Let’s expand our matrix A now, so that we can see each of the values in the matrix (Figure 4).
Piecing things together, we get equalities shown in Figure 5.
Using our definition of the determinant from Figure 3, we can substitute in our values.
Finally, multiplying out terms, we recover a form called the “characteristic polynomial.”
That’s as far as we can go in the abstract! Now, let’s apply this “characteristic polynomial” and solve for our eigenvalues (λ).
Plugging in the values from our matrix into our characteristic polynomial…
We get our eigenvalues (λ). Now, we can substitute our eigenvalues back in to solve for our eigenvectors.
This is an odd result… @λ = 2, [0, -x₂] = 0. Our x₁ seems to have disappeared. What does this mean?
It means that for λ = 2 as long as x₂ is zero, x₁ can equal anything.
- [5, 0], [1, 0], and [-3, 0] for instance
Similarly, for λ = 1 as long as x₁ is zero, x₂ can equal anything.
- [0, 2], [0, -1], [0, 8] for instance
We express this invariance by substituting in a placeholder variable t for the terms that can take on any value.
This is exactly what our visual intuitions showed us! All the horizontal vectors of our space are eigenvectors and they are scaled by the eigenvalue 2. All the vertical vectors of our space are eigenvectors and they are scaled by the eigenvalue 1.
I found a lot of value in plotting things for myself. If you want to try, check out the source below!
It is important to give credit where it is most definitely due! While the code in the article is mine, the package used for visualization (manim) is certainly not! The visualization library and the method of explanation are shamelessly stolen from 3blue1brown.
I mentioned it earlier in the article, but I love this Coursera series: Mathematics for Machine Learning Specialization!