Image Quantization with K-Means. A simple hands-on tutorial for image… | by Paul Gavrikov | Oct, 2022

By Jessie Hobb On Oct 11, 2022

A simple hands-on tutorial for image compression via quantization with python, scikit-learn, numpy, PIL, and matplotlib

Quantization refers to a technique where we express a range of values by a single quantum value. For images, this means that we can compress an entire color range into one specific color. This technique is lossy i.e. we intentionally lose information in favor of lower memory consumption. In this tutorial, I will show you how to implement color quantization yourself with very few lines of code. We are going to use Python with scikit-learn, numpy, PIL, and matplotlib.

Lets’s start by downloading the beautiful image of “Old Man of Storr” taken by Pascal van Soest which we will work with (if you are on Windows or do not have access to wget simply download the image and save it as image.jpeg):

wget -O image.jpg https://unsplash.com/photos/ZI9X8Kz3ccw/download?ixid=MnwxMjA3fDB8MXxhbGx8MjN8fHx8fHwyfHwxNjY1MzE2ODE1&force=true

Original (resized) image. Photo by Pascal van Soest on Unsplash.

Next, we can load the image, resize it for better performance, and view it as numpy-array:

from PIL import Image
import numpy as np
import matplotlib.pyplot as pltimg = Image.open("image.jpg").resize((960, 600))
x = np.asarray(img)

The image is encoded in a width * height * channels large array (here: 960 * 600 * 3). Typically color images are stored as RGB and will have 3 color channels (red, green, and blue). You can imagine this as a large 2D array where each entry contains 3 values. Every value represents the intensity of the specific color channel between 0 and 255 (2**8-1). In fact, this is already an 8-bit quantization itself.

For global quantization, we discard information about each channel and simply treat all intensities in our array as one large vector. We can plot the resulting histogram easily with matplotlib:

plt.figure(figsize=(16, 6))
plt.hist(x.ravel(), bins=np.arange(256), density=True, linewidth=0)
plt.xlabel("Value")
plt.ylabel("Density")

Global distribution of color intensities.

For quantization, we want to replace these 256 values with a lower number, e.g. 8. To accomplish this we could simply equally divide the space into 8 “bins” and map all values inside to the mean value of that bin. But we can see that the intensities in our image are not uniformly distributed: there is a large peak slightly above zero and a larger accumulation of intensities around 160. If we would equally divide the space we would neglect the skewed distribution and under/over-represent specific intensities. Instead, we would like to have more narrow bins in areas with high density for higher precision there and wider bins in less dense areas, since we do not have many samples there anyway.

We can accomplish this by using K-Means. This is an unsupervised clustering algorithm that is popular for finding k cluster centers (called centroids) in given data. You may have seen an application in multi-dimensional problems, but it also works for a 1D distribution such as in our problem. I am not going to introduce K-Means here — there are countless articles that explain it far better than I could, or alternatively, if you prefer a video I highly recommend watching Josh Starmer’s StatQuest.

For this article, we will use a slightly different version of K-Means called MiniBatchKMeans. Similarly, to the optimization in deep learning, the idea here is to not compute clusters on all samples but greedily approximate the solution by computing clusters on smaller batches. This speeds up convergence by a lot!

Training MiniBatchKMeans is very easy thanks to scikit-learn:

from sklearn.cluster import MiniBatchKMeansk_means = MiniBatchKMeans(k, compute_labels=False)
k_means.fit(x.reshape(-1, 1))

Note that we pass x.reshape(-1, 1) to MiniBatchKMeans. This flattens our 3D matrix into a vector and adds a fake dimension of size 1, as the estimator only supports 2D-shaped arrays. Also, we tell the estimator to not compute labels for each batch via compute_labels=False, which otherwise significantly increases training time. After training, we want to map our color intensities to the closest centroid. The estimator has no function to do this directly, but we can predict the centroid label for each sample and then use this label to resolve the value of the centroid:

labels = k_means.predict(x.reshape(-1, 1))
q_x = k_means.cluster_centers_[labels]

We now already have our quantized representation of our original image, but we need to reshape the array back to the original image shape and convert all the float numbers that scikit-learn works with back to integers:

q_img = np.uint8(q_x.reshape(x.shape)

Let’s put it all together into one function that will return the quantized image as numpy array:

from sklearn.cluster import MiniBatchKMeansdef quantize_global(x, k):
k_means = MiniBatchKMeans(k, compute_labels=False)
k_means.fit(x.reshape(-1, 1))
labels = k_means.predict(x.reshape(-1, 1))
q_x = k_means.cluster_centers_[labels]
q_img = np.uint8(q_x.reshape(x.shape)
return q_img

Let’s see what happens to our intensity distribution after quantization with k=8:

quantized_img  = quantize_global(x, 8)plt.figure(figsize=(16, 6))
plt.hist(x.ravel(), bins=np.arange(256), density=True, linewidth=0, label="original")
plt.hist(quantized_img.ravel(), bins=np.arange(256), density=True, linewidth=0, label="quantized")
plt.xlabel("Value")
plt.ylabel("Density")

Global distribution of color intensities before and after quantization (global, k=8).

As we can see, our original distribution has been replaced by 8 values, just as we requested. Note how the centroids are spaced unequally, depending on the density of the original distribution.

Finally, let’s test out values for k and see how this affects our results:

Image.fromarray(quantize_global(x, k))

Global K-Means quantization with k=1, 2, 4, 8, 16, and 32 (top to bottom, left to right).

For k=1 we just see just a gray image. This is not surprising because we only have one color intensity left and that will be somewhere in the middle of our color space (i.e. ~125). (125, 125, 125) is gray in RGB. As we increase k we see that the resulting image more accurately represents the original image since we learn more intensities to describe our image. Now, pay attention to the image of k=8 — the image foreground looks very accurate, but the background is very scattered. There are two important takeaways from this: 1) quantization makes gradients (such as in the gray sky) look bad; 2) due to our KMeans approach we focused more on the foreground which appears to have more dense distributions of intensity.

You may be surprised to see more than k colors in each image (e.g. see the k=2 image), but the explanation is fairly simple: although we only learn to represent our image with k intensities, we still have 3 channels which give us k**3 combinations of colors that we can represent.

But what happens to the image size? If we save our images to disk we can already see a reduction in size, although there is more processing done in the saving process that we are unaware of.

The image size on disk (as PNG) increases with the number of centroids (k).

In science, it is good practice to benchmark your approach. So you may wonder how well we reconstruct the original image. Let’s test this by computing the absolute and squared error between the quantized and original image with numpy and plot the error as a bar plot:

plt.figure(figsize=(8, 5))
plt.bar(range(9), [np.linalg.norm((x - quantize_global(x, 2**i)).ravel(), ord=1) for i in range(9)])
plt.xlabel(r"Number of Centroids ($2^k$)")
plt.ylabel(r"Absolute Error")plt.figure(figsize=(8, 5))
plt.bar(range(9), [np.linalg.norm((x - quantize_global(x, 2**i)).ravel(), ord=2) for i in range(9)])
plt.xlabel(r"Number of Centroids ($2^k$)")
plt.ylabel(r"Squared Error")

The absolute (left) and squared (right) error of the quantized image at different values of k.

We can see that the absolute error decreases with the number of centroids and eventually becomes zero with k=256 (then we have no compression). Still, there is some error e.g. due to our naive casting to uint8 without rounding that causes some visible squared error. Interestingly, the squared error also seems to increase up until k=4. Please understand that the error is a good way to mathematically capture the differences, but our human eyes may be less sensitive to that difference. After all, this error may not mean that much in that sense. Ask yourself: can I spot the error between k=32 and the original image above?

Channel-wise quantization

Up until now, we have treated all intensities as similar, independent of the channel. However, if we plot the intensity distributions per channel we can see some differences, in particular, in the blue channel:

plt.figure(figsize=(16, 6))plt.hist(x[:, :, 0].ravel(), color="red", bins=np.arange(256), density=True, linewidth=0, alpha=0.5)
plt.hist(x[:, :, 1].ravel(), color="green", bins=np.arange(256), density=True, linewidth=0, alpha=0.5)
plt.hist(x[:, :, 2].ravel(), color="blue", bins=np.arange(256), density=True, linewidth=0, alpha=0.5)plt.xlabel("Value")
plt.ylabel("Density")

Channel-wise distribution of color intensities.

We can also easily adjust our code to compute the quantization per channel instead of globally:

def quantize_channels(x, k):
quantized_x = x.copy()
for d in range(3):
channel = x[:, :, d].copy()
k_means = MiniBatchKMeans(k, compute_labels=False)
k_means.fit(channel.reshape(-1, 1))
labels = k_means.predict(channel.reshape(-1, 1))
quantized_x[:, :, d] = np.uint8(k_means.cluster_centers_[labels]).reshape(channel.shape)
return quantized_x

and then benchmark the loss via the code above against the global quantization:

The absolute (left) and squared (right) error of the quantized image at different values of k with global- and channel-wise quantization.

However, at best, it is marginally better than global quantization and occasionally even worse. Given the fact, that it requires up to 3x more memory and training cost is also 3x more expensive it seems like global quantization is a much better approach.

I have attempted to understand why channel-wise quantization performs so poorly and it seems that the reason is simply that we treat color channels independently and the centroids do not differ that much:

quantized_img = quantize_channels(x, 8)plt.figure(figsize=(16, 6))
plt.hist(quantized_img[:, :, 0].ravel(), bins=np.arange(256), density=True, linewidth=0, label="R", color="red")
plt.hist(quantized_img[:, :, 1].ravel(), bins=np.arange(256), density=True, linewidth=0, label="G", color="green")
plt.hist(quantized_img[:, :, 2].ravel(), bins=np.arange(256), density=True, linewidth=0, label="B", color="blue")
plt.xlabel("Value")
plt.ylabel("Density")

Color intensity distribution after channel-wise quantization with k=8.

A better approach might be to treat every color as a 3D RGB vector and apply clustering there in that space. But I will leave that up to you! Given the code snippets above you should be able to create that easily.