The Power of Linux Cgroups: How Containers Take Control of Their Resources | by Dimitris Poulopoulos | Jan, 2023

By Jessie Hobb On Jan 10, 2023

Optimizing Container Resource Allocation with Linux Control Groups

The last article examined how to use Linux Namespaces to create isolated environments within a single Linux system. This article is part of our effort to deeply understand how containers work by looking under the hood.

Namespaces are the first step of our journey. We saw how you can create a PID namespace to create a world where the processes running within are under the assumption that they are the only ones in existence, but how can you enforce a limit to the number of resources they can consume? Enter Linux cgroups.

Linux control groups, or cgroups, are a powerful tool for managing and allocating resources in a Linux system. They allow administrators to limit the resources used by processes or groups of processes, ensuring that essential system services always have access to the resources they need to function properly.

But cgroups are not just useful for system administrators — they also provide a way for containers to take control of their own resources, enabling them to run more efficiently and reliably within a shared host environment.

In this article, we explore the benefits of using cgroups in the context of containers and show you how to get started with cgroups in your own environment. First, we will create a control group to limit the memory consumption of a process that runs in its context and then run a whole namespace under it. Let’s dive in!

Learning Rate is a newsletter for those who are curious about the world of MLOps. MLOps is a broad field that strives to bring ML models to production in an efficient and rerpoducible way. Containers play a crucial role in the pipeline. If you want to learn more about topics like this subscribe here. You’ll hear from me on the first Saturday of every month with updates and thoughts on the latest MLOps news and articles!

Linux control groups, or cgroups, is a kernel feature that allows an administrator to allocate resources such as CPU, memory, and I/O bandwidth to groups of processes.

Cgroups provide a way to control how much of the system’s resources a process or a group of processes can use. For example, an administrator could create a cgroup for a group of processes associated with a specific application (e.g., a web application running on a server) and then set limits on the amount of CPU and memory that those processes are allowed to use.

Cgroups are helpful for various purposes, including improving system performance, isolating processes for security purposes, and simplifying managing multiple applications on a single system.

In the context of containers, cgroups allow us to limit the resources each container can consume so that our application cannot take over the whole server or ensure it has the resources it needs to function properly.

For example, setting the resources section of a pod in Kubernetes is always a good idea because it helps the Kubernetes scheduler decide on which node to schedule the pod. This ensures that our application will have everything it needs to run correctly. If not, our application won’t even start.

How can we use cgroups in practice? Let’s see this by creating two simple examples: first, we will create a simple application and run it under a specific cgroup, then run a whole namespace in a similar context.

Our journey in the world of cgroups starts here. First, we will create a simple application and run it in the context of a memory-limiting cgroup. Then, we will create a Linux namespace in the same context and run the same application inside this namespace.

Memhog

To make this example work, we need to control the amount of memory the application uses. To this end, we will use a Debian package called memhog.

memhog is a simple package that allocates the memory we tell it to for testing purposes (see the manpage here for more details).

The first step is to install memhog:

sudo apt update && sudo apt install numactl

Test that you have installed memhog properly by asking it to allocate 100 megabytes of memory:

memhog 100M

The output should be something like this:

..........

If you get this output, then you’re good to go! Let’s create a bash script that will run this every 2 seconds. So, our application is a service that asks for 100 megabytes every two seconds. Create a new file, call it memhog.sh and put the following content inside:

#!/bin/bash
while true; do memhog 100M; sleep 2; done

Finally, make the file executable:

sudo chmod +x memhog.sh

To create, manage, and monitor cgroups, we need another package called cgroup-tools. So, first, you need to install this:

sudo apt install cgroup-tools

Now that we have this in our toolbox, the process goes as follows:

Create a new cgroup using the package we just installed
Set a limit for the resource we want to control for this specific cgroup
Run the application or namespace under this cgroup

Thus, let us first create the cgroup. To achieve this, use the following command:

sudo cgcreate -g memory:memhog-limiter

This command creates a new cgroup (cgcreate) of type memory and sets its name to memhog-limiter. What this command actually did was create a new directory under /sys/fs/cgroup/memory, and you can view its contents by running ls:

ls -la /sys/fs/cgroup/memory/memhog-limiter/
drwxr-xr-x 2 root root 0 Jan  9 05:56 .
dr-xr-xr-x 8 root root 0 Jan  4 10:31 ..
-rw-r--r-- 1 root root 0 Jan  9 05:56 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  9 05:56 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  9 05:56 cgroup.procs
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.failcnt
--w------- 1 root root 0 Jan  9 05:56 memory.force_empty
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.numa_stat
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.oom_control
---------- 1 root root 0 Jan  9 05:56 memory.pressure_level
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.stat
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.swappiness
-r--r--r-- 1 root root 0 Jan  9 05:56 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jan  9 05:56 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jan  9 05:56 notify_on_release
-rw-r--r-- 1 root root 0 Jan  9 05:56 tasks

(Depending on your system, the location or structure of this directory may vary)

Now that we have created the cgroup let’s set our limits. We’ll set a limit of 50 megabytes, which means that any process that runs in this context cannot exceed it. Similarly, when talking about groups of processes, the sum of their needs cannot exceed this limit.

To set the memory limit, run the following command:

sudo cgset -r memory.limit_in_bytes=50M memhog-limiter

This command sets a memory limit of 50 megabytes for the cgroup momhog-limiter. If you cat the corresponding file from the directory structure we saw earlier, you’ll see precisely this (in bytes):

cat /sys/fs/cgroup/memory/memhog-limiter/memory.limit_in_bytes 
52428800

We’re ready to run our application and namespace in this context.

Our state now is the following: we have created a service that allocates 100 megabytes of memory every two seconds. We have also created a cgroup that limits the memory a process or a group of processes can consume to 50 megabytes. What do you expect to happen if we try to run our service in this context?

Without further ado, to execute a service in this context, run the command below:

sudo cgexec -g memory:memhog-limiter ./memhogtest.sh

The result is what we expected it to be: the Linux kernel kills the service every time it tries to run:

....../memhogtest.sh: line 2: 174662 Killed                  memhog 100M
....../memhogtest.sh: line 2: 174668 Killed                  memhog 100M

That’s great; this is precisely what we’d like to see. But now, how can we create a namespace in this context? This is similar to what containers do, so if we achieve this, we’ll be on our way to creating a container-like environment without Docker, which is the goal of this series.

Creating a namespace in this context is pretty simple. The following command may look familiar to you:

sudo cgexec -g memory:memhog-limiter unshare -fp --mount-proc

Sure, we used the first part before to run a service in the context of our memhog-limiter cgroup (sudo cgexec -g memory:memhog-limiter). Furthermore, we’ve seen the second part of the command in the article; we talked about namespaces (unshare -fp — mount-proc): it is the command we used to create a new PID namespace.

Thus, if we bring everything together, this command creates a new PID namespace in the context of our cgroup. To verify that you are in a new namespace, run the following command:

# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 08:00 pts/0    00:00:00 -bash
root          12       1  0 08:00 pts/0    00:00:00 ps -ef

As you can see, in your new namespace only bash is running as PID 1. So, every service you’ll start now will start in the context of our cgroup. Let’s verify this:

# ./memhogtest.sh 
....../memhogtest.sh: line 2:    14 Killed                  memhog 100M
....../memhogtest.sh: line 2:    16 Killed                  memhog 100M
....../memhogtest.sh: line 2:    18 Killed                  memhog 100M

This is great! We got the same output as before. If you want to play around, you could reduce the memory that memhog tries to allocate and make it work. In any case, congratulations! You’re one step closer to creating your own Linux containers without Docker!

In summary, Linux cgroups is a powerful tool for managing and allocating resources in a Linux system. They allow administrators to limit the amount of resources that can be used by processes or groups of processes, ensuring that essential system services always have access to the resources they need to function properly.

In the context of containers, cgroups provide a way for containers to take control of their own resources, enabling them to run more efficiently and reliably within a shared host environment.

This story examined how you can use cgroups in practice and brought us one step closer to our final goal: create a container-like environment in Linux without Docker. Next stop? Overlay filesystems!

My name is Dimitris Poulopoulos, and I’m a machine learning engineer working for Arrikto. I have designed and implemented AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA.

If you are interested in reading more posts about Machine Learning, Deep Learning, Data Science, and DataOps, follow me on Medium, LinkedIn, or @james2pl on Twitter.

Opinions expressed are solely my own and do not express the views or opinions of my employer.