Unveiling Docker's Secrets: Exploring Namespaces, Cgroups, and Union Filesystems
INTRODUCTION
Imagine a world where applications run seamlessly across environments—development, testing, and production—without the familiar headache of "it works on my machine." Docker made this dream a reality, revolutionizing how software is built, shipped, and deployed.
In today’s blog, we’ll dive deep into the mechanics of how Docker orchestrates and manages several containers seamlessly. Have you ever wondered how Docker allows multiple containers to run side by side without stepping on each other’s toes? Or how it ensures that a single container doesn’t hog all the system’s resources? Or even how it magically layers file systems to create lightweight, efficient container images?
The answers lie in three fundamental building blocks of Docker’s architecture: Namespaces, Control Groups (cgroups), and Union Filesystems. By the end of this blog, you’ll have a clear understanding of how Docker brings the promise of containerization to life. Whether you're a developer curious about Docker's inner workings or a systems enthusiast seeking to deepen your technical knowledge, this blog is for you.
So, let’s dive in and uncover how Docker transforms your system into a powerhouse of isolated, resource-efficient, and lightning-fast containers!
NAMESPACES.
A namespace is a key feature of the Linux kernel that creates isolated environments, restricting what processes can see or access. This is essential for technologies like Docker, which use namespaces to ensure secure and isolated execution of applications. It's like putting processes in their own "private room" where they can only see and interact with their own resources, even though they are sharing the same physical computer with other processes.
Example : Let’s assume that you have a host system where the init process ( the first process that the Linux kernel starts after it has initialized itself ) with PID 1 manages processes at the system level. On this host you start two Docker containers: Container A and Container B.
When you start container A it get’s its own PID namespace. Within this namespace:
The first process inside container A ( let’s say a web server) is assigned PID 1. Additional processes started in container A will get PIDs like 2, 3, etc, but these PIDs are visible only within Container A
Similarly When you start container A it get’s its own PID namespace. Within this namespace:
The first process inside container A ( let’s say a database server) is assigned PID 1. Additional processes started in container A will get PIDs like 2, 3, etc, but these PIDs are visible only within Container A
From the host system’s perspective, all container processes are visible but with different PIDs. For example:
The web server in Container A may appear as PID 201 on the host. The database server in Container A may appear as PID 301 on the host.
So I hope that you understood the concept of namespaces now let’s see the various types of namespaces
PID (Process ID) Namespace: Isolates the process ID number space, allowing containers to have their own independent set of process IDs. This prevents processes inside a container from being aware of processes outside the container.
NET (Network) Namespace: Isolates network resources such as interfaces, IP addresses, routing tables, and port numbers. Each container gets its own virtual network stack, ensuring network traffic and configurations are isolated.
USER Namespace: Isolates user and group ID numbers, allowing containers to have their own set of user and group IDs. This enhances security by preventing privilege escalation attacks. Docker isolates the user IDs and group IDs of processes in a container. This means that processes in one container cannot see the user IDs and group IDs of processes in another container.
MNT Namespace: The MNT (Mount) namespace isolates the set of filesystem mount points, allowing each container to have its own independent view of the filesystem hierarchy. This means that containers can mount and unmount filesystems without affecting the host or other containers.
UTS Namespace: The UTS (Unix Timesharing System) namespace allows each container to have its own hostname and domain name, providing isolation for system identification attributes. Here’s how it works
IPC Namespace: The IPC (Inter-Process Communication) namespace in Docker isolates the communication mechanisms that processes use to exchange data, such as shared memory segments, semaphores, and message queues
Cgroups (control groups)
Control Groups (cgroups) are a Linux kernel feature that allows you to allocate, limit, and monitor the resources (such as CPU, memory, I/O, and network bandwidth) used by a group of processes.
control groups allocate a specific amount of resources (eg: CPU core or memory) to a groups of processes. For example a container running a database can be limited to use only 2 GB of memory. Cgroups prevent processes from exceeding the allocated resources.
Union File System
Docker uses a union file system to efficiently manage container images and layers. It leverages OverlayFS is a union mount filesystem implementation for Linux. So basically with the overlayFS, and more specifically the overlay2 storage driver which docker uses contains 3 layers.
Base Layer
Overlay Layer
Diff Layer
Let’s see each and every layer in Detail
Base Layer
This is where the base files for your file system are stored, this layer ( from the overlay view) is read only. If you were to pull an image of ubuntu then that image would be your Base Layer. In more simpler terms just think of it as your base layer.
Overlay Layer
The Overlay layer is where the user operates, it initially offers a view of the base layer and gives the user the ability to interact with files and even “write” to them! When you write to this layer changes are stored in our next layer. When changes are made this layer will offer a union view of the Base and Diff layer with the Diff layer’s files superseding the Base layer’s.
Again if you want to think about this in terms of Docker images you can think of this layer as the layer you see whenever you run a container.
Diff Layer
Any changes made in the Overlay Layer are automatically stored in this layer.
So right now you’re probably thinking, but what if you made changes to something that’s already found in the base layer? Well worry not some smart person a while ago thought about this as well!
Whenever you write to something that’s already found in the base layer the overlay(fs)
will copy the file over to the Diff Layer and then make the modifications you just tried to write. This type of operation is known as a copy-on-write
operation and is probably the most important part of making a Union File System function correctly.