🐳 Essential Settings for AI/Data Workloads: Understanding Docker Shared Memory (shm_size and ipc) Perfectly



If you've encountered an unknown error like OSError: No space left on device while working with AI and large-scale data processing, it's usually due to insufficient shared memory (shm_size) settings in Docker containers.

This post clearly outlines why shared memory is crucial in container environments and how to correctly set shm_size and ipc: host options.


1. The Role and Importance of shm_size

Role: Determining the Size of Shared Memory in Containers

shm_size is an option that sets the maximum size for the /dev/shm (POSIX shared memory) filesystem inside the container.

  • The default in Docker is 64MB, which is very small.

  • Note: /dev/shm is a tmpfs (temporary filesystem) using host RAM, and is not related to VRAM (GPU memory).

Why is it important?

AI/data processing tasks primarily use this shared memory when exchanging large amounts of data between processes.

  • PyTorch DataLoader: When num_workers > 0 is set, the worker processes pass tensors/batches through shared memory. If this space is insufficient, the OSError: No space left on device error occurs.

  • TensorRT Engine Build/Serving: Heavily utilizes shared memory for large intermediate artifacts or IPC buffers, and a shortage can lead to engine build failures or segmentation faults.

  • Multiprocessing and IPC Communication: Essential for sharing large arrays/buffers between processes in NCCL, OpenCV, NumPy, etc.


2. ipc Settings: Isolation Scope of Shared Memory



IPC (Inter-Process Communication) namespace is a Docker option that determines how to isolate the communication space (shared memory, semaphores, etc.) between the container's processes.

IPC Settings How It Works Determining /dev/shm Size
Default (omitted) Container uses its own IPC namespace (isolation) Size specified by shm_size (default 64MB)
ipc: host Container shares the host's IPC namespace Size of host's /dev/shm (typically half of the RAM)
ipc: container:<ID> Shares IPC with the specified other container Follows settings of the shared container

3. How shm_size and ipc: host Work Together (Example Analysis)

It is common in AI/LLM workloads to set shm_size: "16g" with ipc: host. We will explore how these settings are applied through a real example.

Example: Settings and Result Analysis

Docker Compose Settings Snippet Result of df -h /dev/shm Inside the Container
shm_size: "16g"
ipc: host
Filesystem tmpfs Size 60G
Used 8.3M
Avail 60G
Use% 1%
Mounted on /dev/shm

Conclusion: ipc: host ignores shm_size.

  1. When ipc: host is applied: the container uses the host's IPC namespace.

  2. shm_size: "16g" is ignored: this option is only meaningful when using its own IPC namespace.

  3. Source of 60G: Host Linux systems typically configure /dev/shm to be about half of the total RAM. Therefore, in the example above, the container sees half of the host's 120G as 60G.

Key Summary

Setting ipc: host means the container uses the host's shared memory space, so the shm_size setting is not actually applied.


4. Recommended Operating Methods and Memory Limit Management

💡 Practical Recommendations

  1. ✅ Prioritize Stability (Recommended): Keep ipc: host

    • Setting: Keep only ipc: host (or can include shm_size as well)

    • Result: Uses the generous /dev/shm size of the host (e.g., 60G).

    • Advantages: Effectively prevents shared memory shortage errors in most AI/data tasks, making it the most stable. As 60G is just the maximum, only the actual usage occupies RAM, so it's convenient to leave it as is if there is no memory pressure.

  2. ✅ Enforce Per-Container Limits: Remove ipc: host

    • Setting: Remove ipc: host + explicitly state shm_size: "8g" or "16g"

    • Result: A container-specific 16GB /dev/shm is created.

    • Advantages: Clearly limits the shared memory usage of each container when multiple containers are running, benefiting host RAM protection and isolation.

⚙️ How to Adjust the Size of Host /dev/shm (Using Option 1)

If you want to change the size of the host's /dev/shm while using ipc: host, you need to modify the tmpfs settings.

  • Temporarily Change Size (Reverts on Reboot):
sudo mount -o remount,size=16G /dev/shm
(Applies immediately to all processes/containers.)
  • Permanently Change Size (Modify /etc/fstab):
# Add/modify the following line in /etc/fstab
tmpfs /dev/shm tmpfs defaults,size=16G 0 0
Apply immediately after saving or rebooting with the above `remount` command.

When should it be increased? You should adjust shm_size or the host /dev/shm size to at least 8G or more if you encounter No space left on device errors intermittently during DataLoader worker operations or TensorRT engine builds.