Photo by Rubaitul Azad on Unsplash
Docker: Layered Architecture
The Basics of Docker's Layered Filesystem
Docker uses a layered filesystem, often referred to as a Union File System (UnionFS), for efficient management of images and containers by optimising image distribution and minimising storage space. This approach is fundamental to Docker's functionality and efficiency.
Here's a brief overview of Docker's filesystem:
Image Layers: Docker images are composed of multiple layers, each representing a specific filesystem snapshot. These layers are stacked on top of each other, with each layer adding or modifying files and directories. Layers are immutable, meaning they cannot be changed once created.
Copy-on-Write (CoW): When you create a new container from an image, Docker uses a copy-on-write mechanism to create a writable container layer on top of the image layers. This writable layer, also known as the container layer or the "topmost" layer, allows modifications to be made to the container filesystem without affecting the underlying image layers. Only the modified files or directories need to be copied to the container's writable layer, while the rest of the filesystem remains shared with the underlying image layers. For example: After building an application's image and running the container, if you make a change to your source code(any file change), that file which you made changes to is already baked into the image layer which is read-only, now if you make any changes to the file system, before saving the changes docker will automatically copy the changes to the container layer, thus it is called copy-on-write(make a copy whenever a change(write) is detected).
Efficient Storage: Docker's use of layered filesystems enables efficient storage and distribution of images. Because layers are shared between images with the same content, disk space is conserved by storing only the differences between layers. This reduces the amount of data that needs to be transferred when pulling and pushing images from registries like Docker Hub.
OverlayFS and Other Union File Systems: Docker relies on various Union File System implementations, such as OverlayFS (the default on many Linux distributions), AUFS (older but still in use), and others, depending on the host system's capabilities and configuration. These filesystems provide the ability to overlay multiple filesystem layers transparently. Read more
Container Lifecycle: As containers are started, stopped, and modified, changes are made to the container layer while the underlying image layers remain unchanged. This ensures that each container maintains its own isolated filesystem environment while sharing common base layers with other containers based on the same image.
Layered File Structure of a Container
Emphasize on Efficient Storage:
Suppose we have three Docker images: baseImage, imageA, and imageB, with imageA and imageB built upon baseImage. Each image has its layers representing changes or additions.
Here's a clearer representation:
baseImage:
- Layer 1 (from base OS)
- Layer 2 (dependencies)
- Layer 3 (application libraries)
imageA (inherits from baseImage):
- Layer 1 (from base OS)
- Layer 2 (dependencies)
- Layer 3 (application libraries)
- Layer A (additional application-specific files)
imageB (inherits from baseImage):
- Layer 1 (from base OS)
- Layer 2 (dependencies)
- Layer 3 (application libraries)
- Layer B (additional application-specific files)
In this diagram:
baseImage contains layers representing the base operating system, dependencies, and application libraries.
imageA and imageB both inherit layers from baseImage up to Layer 3, representing shared resources such as the base OS and common dependencies.
imageA has an additional layer Layer A containing application-specific files unique to imageA.
imageB has an additional layer Layer B containing application-specific files unique to imageB.
With this setup, let's revisit the explanation:
Pulling Images: When pulling imageA and imageB from a Docker registry, Docker fetches only the necessary additional layers (Layer A for imageA and Layer B for imageB). The base layers (Layer 1, Layer 2, and Layer 3) are either already present on the local system or shared between images, so they don't need to be downloaded again.
Pushing Images: When pushing imageA and imageB to a Docker registry, Docker sends only the unique additional layers (Layer A for imageA and Layer B for imageB) to the registry. The base layers are already present in the registry if they are shared with other images, so they don't need to be uploaded again.
This optimised approach conserves bandwidth and storage space by sharing common layers between images with the same content, resulting in faster image pulls and pushes and more efficient use of resources.
Docker layer caching (DLC)
Docker's layer caching mechanism plays a crucial role in facilitating the layered architecture by improving build efficiency, reducing network bandwidth, and optimizing storage space. Here's how Docker's caching mechanism works in the context of its layered architecture:
Build Process: When you build a Docker image, each instruction in the Dockerfile generates a new layer in the image. For example, each RUN, COPY, or ADD instruction creates a new layer containing the changes introduced by that instruction.
Layer Caching: Docker employs a caching mechanism during the image build process. When a Dockerfile is processed, Docker checks if the instruction and its associated context (e.g., file or directory) have changed since the last build. If there are no changes, Docker reuses the existing cached layer instead of rebuilding it.
Efficiency: Layer caching significantly improves build efficiency. If you rebuild an image with minor changes to the Dockerfile or its context, Docker can reuse previously built layers, avoiding redundant operations. This results in faster build times and reduces the need to download or transfer unchanged layers from registries, conserving network bandwidth.
Layer Reuse: Docker's layer caching also promotes layer reuse across different images. If multiple images share common layers (e.g., base OS and dependencies), Docker caches these layers and shares them between images. This minimizes the storage space required for storing images and improves resource utilization.
Cache Invalidation: However, it's essential to understand how Docker's caching works and when it might be invalidated. If any instruction or its context changes in the Dockerfile, Docker invalidates the cache for subsequent instructions. This ensures that subsequent layers are rebuilt to reflect the changes accurately.
Cache-Control: Docker provides mechanisms to control caching behavior explicitly. For example, you can use the --no-cache option to disable caching entirely or leverage multi-stage builds to optimize caching for specific stages of the build process.
Conclusion
Overall, Docker's filesystem architecture plays a crucial role in enabling efficient image distribution, container isolation, and storage optimization, making it a cornerstone of Docker's containerization technology.