A Primer on Linear Algebra: Part 2 | by Rob Taylor | Apr, 2023

By Jessie Hobb On Apr 27, 2023

A gentle refresher on essential concepts and operations for data science

Introduction

In my previous post, I introduced some of the operations and concepts that are fundamental to linear algebra. This included vectors and matrices, as well as the transpose, dot product, and matrix multiplication operators. In this post, I’ll introduce some additional concepts that complement those discussed previously. If you haven’t already seen my primer on linear algebra you can check it out here.

Linear Independence

Before we can define linear independence we first need to define linear dependence. Simply put, a sequence of vectors is linearly dependent if at least one can be written as a linear combination of the others. Specifically, suppose we have a sequence of n vectors v₁, v₂, ⋯, vₙ that comprise the columns of a matrix V. Linear dependence holds if and only if there exists n scalars a₁, a₂, ⋯, aₙ such that:

Condition for linear dependence (image by author).

where 0 denotes the zero vector and at least one of the aᵢ is not equal to zero.

This requirement is important because without it you could just set all a to zero and obtain the result. The definition of linear independence, then, is just the converse case; that is, the case where the sequence of vectors is not linearly dependent. This implies that the following condition holds:

Condition for linear independence (image by author).

and therefore requires that all scalars are zero. Under these conditions, no vector in the sequence can be represented as a linear combination of any of the remaining vectors.

For example, suppose we have two vectors v₁ and v₂, each of which is ℝ². For linear independence to hold we require a set of coefficients such that:

Example showing linear independence for a 2 x 2 matrix (image by author).

where both a₁ and a₂ equal zero.

Determinant

The determinant is a scalar value that is a function of the elements in a square matrix. If the dimensionality of the matrix is small the determinant is fairly straightforward to compute by hand. For example, let A be a 2 × 2 matrix; in this case, the determinant is simply:

The determinant of a 2 x 2 matrix (image by author).

We can also compute the determinant for a 3 × 3 matrix, though this time the process is a little more involved. I won’t delve into details here, but the solution for this case is:

The determinant of a 3 x 3 matrix (image by author).

This solution is called the Leibniz formula for the determinant and generalizes to higher dimensions. Again, I won’t dive into the details here but will provide the general formula, which is:

Leibniz formula for the determinant in arbitrary dimensions (image by author).

where sgn is the sign function of the permutations contained in the group Sₙ, and σ denotes a function that reorders — or permutes — the set of integers.

While the formula for the determinant is not particularly intuitive, the information it provides is. The determinant is inherently geometric and tells us how an image changes under transformation. Thinking again about a simple 2 × 2 matrix, the determinant is actually the area of a parallelogram, which itself represents the image of the unit square under the transformation given in the matrix.

This also works for higher dimensions, too, though now the determinant corresponds to a volume, not an area. For example, the determinant of a 3 × 3 matrix is the volume of a parallelepiped, whereas the determinant of any n × n matrix is the hypervolume of an n-dimensional parallelogram.

Rank

Definitionally, the rank of a matrix determines the maximal number of linearly independent columns; though more formally, it corresponds to the dimensionality of the vector space spanned by its columns. Typically, we want matrices to have full rank because this condition implies there is no redundancy between column vectors. Any matrix where linear dependencies exist between columns will not have full rank and is referred to as rank-deficient.

To illustrate, consider a square n × n matrix A. If all columns in this matrix are linearly independent, then the matrix is said to have full column rank which will be equal to n. Now, because the matrix is square, we could also consider whether its rows are linearly independent. If so, then the matrix also has full row rank, which will also equal n. Because these are equivalent a square matrix is considered to have full rank if all rows and columns are linearly independent, which is denoted as rank(A) = n.

In fact, for square matrices, full rank is possible if and only if its determinant is non-zero. Therefore, we can actually use the determinant to test for linear independence in square matrices.

But, what if the matrix is not square? Well, in this case, full rank is defined a little differently. Suppose we have a non-square matrix B with m rows and n columns, then full rank is defined as the highest row or column rank possible given the shape of the matrix. Counterintuitively, this will equal whichever dimension is the smallest.

For example, if B has a greater number of rows relative to its columns (i.e., m > n) then full rank requires that B has full column rank, and so rank(B) = n. Conversely, if the number of rows is less than the number of columns (i.e., m < n), then B must have full row rank, and so rank(B) = m. This is true because if a matrix is non-square then either its rows or columns must be linearly dependent.

Matrix Inversion

Definitionally, an n × n square matrix A is considered invertible if there exists another square n × n matrix B that ensures the following holds:

This states that invertibility holds if the matrix product of A and B is the identity matrix. If this is indeed true, then B is uniquely determined by A and we say that matrix B is the multiplicative inverse of A, which we write as A⁻¹. Matrix inversion is then the task of trying to find a matrix B that satisfies the invertibility condition. I won’t get into the details here on the numerical methods used in matrix inversion, however.

Note that a matrix can only be inverted if it has full rank, which implies that the columns of A are linearly independent. Any matrix that cannot be inverted is then said to be degenerate, or singular.

Final Remarks

This post provides a lighter touch on some essential concepts in linear algebra. Like any topic, you can really delve into the details, so this piece is not entirely comprehensive, and only just scratches the surface. That being said, the concepts discussed here are essential when building mathematical models so are important for data scientists to be aware of. In a later post, we’ll see how these concepts, along with those introduced in my earlier primer, are applied when building linear regression models. So stay tuned!