ONNX Runtime GPU: Why Build It Yourself?
Before diving into the actual build, let’s clarify a few concepts.
What is onnxruntime_gpu?
ONNX Runtime (ORT) is an optimized AI inference engine. After converting a model built with PyTorch or TensorFlow to the .onnx format, ORT helps you run it as fast as possible on a variety of hardware. The onnxruntime_gpu variant leverages NVIDIA GPU CUDA and cuDNN to squeeze out maximum performance.

Why Is It Commonly Used for Generative AI Models (Images/Video)?
Models like Stable Diffusion or Whisper are massive and computationally intensive. ORT offers several optimizations:
- Graph Optimization: Removes redundant operations and fuses multiple operators into a single one.
- Memory Management: Handles GPU memory allocation efficiently so even large models run smoothly.
- Hardware Acceleration: When paired with accelerators such as TensorRT, it can deliver several‑fold speedups over pure PyTorch.
Why Do ARM/aarch64 Users Need to Build from Source?
Running pip install onnxruntime-gpu works fine on most platforms, but aarch64 is a different story.
- No Pre‑built Binaries: The wheels on PyPI are primarily built for x86_64.
- Cutting‑Edge Architecture Support: Very new stacks like CUDA 13.0 or Compute Capability 12.1 (Blackwell) often aren’t covered by official releases yet.
- Tailored Optimizations: If you want to fine‑tune the build for a specific machine (e.g., DGX Spark), compiling yourself is the only way.
Building ONNX Runtime GPU on DGX‑Spark (aarch64)
Below is a step‑by‑step guide based on my own experience.
1. Environment Check
First, verify the specs of your system. I used a DGX‑Spark equipped with a Grace Blackwell GPU.
| Item | Details |
|---|---|
| OS/Arch | Linux aarch64 |
| GPU | NVIDIA GB10 (Blackwell) |
| CUDA | 13.0 (V13.0.88) |
| Python | 3.12.3 |
| Compute Cap | 12.1 |
# Check GPU info
nvidia-smi
# Check CUDA version
nvcc --version
2. Fill the Gaps: Install cuDNN
Because cuDNN wasn’t present on my system, I first installed the Python‑package version and then configured the build scripts to use it.
# Install cuDNN for CUDA 13
pip install nvidia-cudnn-cu13
# Locate the installation path (using a small Python snippet)
python3 -c "import site, os; print(os.path.join(site.getsitepackages()[0], 'nvidia/cudnn'))"
Next, export the necessary environment variables so the build can locate cuDNN.
export CUDA_HOME=/usr/local/cuda
export CUDNN_HOME=/home/jesse/onnxruntime/venv/lib/python3.12/site-packages/nvidia/cudnn
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDNN_HOME/lib:$LD_LIBRARY_PATH
export CUDACXX=$CUDA_HOME/bin/nvcc
3. Build ONNX Runtime from Source
Now comes the core step. Run the build.sh script, specifying the Blackwell architecture (121) and the custom CUDA paths.
./build.sh \
--config Release \
--update --build \
--parallel \
--build_wheel \
--use_cuda \
--cuda_home $CUDA_HOME \
--cudnn_home $CUDNN_HOME \
--skip_tests \
--cmake_generator Ninja \
--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=121 \
--cmake_extra_defines CMAKE_CUDA_FLAGS="-I/usr/local/cuda/include/cccl" \
--cmake_extra_defines CMAKE_CXX_FLAGS="-I/usr/local/cuda/include/cccl"
Tip: Adding
CMAKE_CUDA_FLAGShelps avoid include‑path errors related toCCCL.
4. Verify the Output and Install
When the build finishes, a wheel file for aarch64 appears under build/Linux/Release/dist/.
# List the generated wheel
ls -lh build/Linux/Release/dist/onnxruntime_gpu-*.whl
# Install it
pip install build/Linux/Release/dist/onnxruntime_gpu-1.25.0-cp312-cp312-linux_aarch64.whl
5. Final Validation
Check that the library correctly detects the GPU.
import onnxruntime as ort
print("ORT version:", ort.__version__)
print("Available providers:", ort.get_available_providers())
Result: If you see ['CUDAExecutionProvider', 'CPUExecutionProvider'], you’re good to go! 🎉
Closing Thoughts
The wheel you built can later be copied into a Docker image with a simple COPY command, making it reusable across deployments. Building for aarch64 and the latest CUDA stacks can be fiddly, but once you have a solid build, you’ll enjoy unmatched performance when serving generative AI models.
I hope this guide shines a light for fellow DGX‑Spark users!
There are no comments.