# Porting Ditto TalkingHead (DGX Spark / ARM64) to TensorRT – A Detailed Log ## Purpose {#sec-cb0dea923563} To run the `ditto-talkinghead` project on a DGX Spark (ARM64 / aarch64) system, I converted the existing ONNX models into TensorRT engines compatible with this environment and verified inference. Because `warp_network.onnx` depends on a custom `GridSample3D` plugin, the core challenge was **loading a TensorRT custom plugin (.so) built for the target architecture**. ![image of the porting process on DGX-Spark](/media/whitedec/blog_img/426abdb9c02c4ff596bc3a648c4e118a.webp) --- ## Why This Was Necessary {#sec-f9697698d220} The Ditto checkpoint includes `libgrid_sample_3d_plugin.so`, but the binary is compiled for **x86‑64**. My setup: - **DGX Spark** - **ARM64 (aarch64)** - TensorRT 10.14.1 - CUDA 13.1 Since an x86 plugin cannot be loaded on an ARM TensorRT runtime, the parser failed on `warp_network.onnx` and the TRT engine could not be created. --- ## Symptoms (Key Failure Log) {#sec-e2f2b34198c5} During TRT conversion, `warp_network` produced the following errors: - `Unable to load library: libgrid_sample_3d_plugin.so` - `Plugin not found ... GridSample3D` - `Fail parsing warp_network.onnx` - Final error: `Network must have at least one output` The root cause was the failure to load the `GridSample3D` plugin. --- ## Root‑Cause Analysis {#sec-afbb5beda194} ### 1) Verify plugin file existence {#sec-f22d03805ef3} The file was indeed present at: - `./checkpoints/ditto_onnx/libgrid_sample_3d_plugin.so` So it wasn’t a simple path issue. --- ### 2) Check with `ldd` {#sec-1170c3414e80} Running `ldd ./checkpoints/ditto_onnx/libgrid_sample_3d_plugin.so` returned: - `not a dynamic executable` This hinted at an architecture mismatch. --- ### 3) Confirm architecture with `file` / `readelf` {#sec-e7261ce0730b} The decisive evidence: - `file` output: `ELF 64-bit LSB shared object, x86-64` - `readelf -h` output: `Machine: Advanced Micro Devices X86-64` The supplied `.so` is an **x86‑64 binary**, which cannot be loaded on ARM64. --- ## Solution (Key Insight) {#sec-dc7cf20bb7a5} ### Conclusion {#sec-0a46238fba13} The `GridSample3D` TensorRT plugin must be **rebuilt for ARM64**. --- ## Step‑by‑Step Procedure {#sec-5e55de8f89f5} ### 1) Obtain plugin source {#sec-bff2bed41bc2} The `ditto-talkinghead` repository only provided the binary, so I fetched the source from a separate repo: - `grid-sample3d-trt-plugin` Key source files: - `grid_sample_3d_plugin.cpp` - `grid_sample_3d_plugin.h` - `grid_sample_3d.cu` - `grid_sample_3d.cuh` --- ### 2) Verify TensorRT / CUDA environment {#sec-e9fc6e9bb3f9} The environment details: - TensorRT Python: `10.14.1.48` - TensorRT library: `/usr/lib/aarch64-linux-gnu/libnvinfer.so` - CUDA: `13.1` - GPU capability: `(12, 1)` → CUDA arch `121` --- ### 3) Adjust CMake build settings {#sec-03d93adfb2f1} The original CMake file hard-coded GPU architectures specific to x86 and caused several issues: - Forced `compute_70` (unsupported by the current nvcc) - Missing include path for `cuda_fp16.h` - No explicit TensorRT lib directory #### Fixes applied - Removed hard‑coded `CUDA_ARCHITECTURES` - Added TensorRT include and lib paths - Specified CUDA include path (`/usr/local/cuda/targets/sbsa-linux/include`) - Disabled optional test subdirectory build --- ### 4) Build the ARM64 plugin successfully {#sec-1755c4e2f89d} ```bash cd /workspace/grid-sample3d-trt-plugin rm -rf build mkdir build && cd build cmake .. \ -DCMAKE_BUILD_TYPE=Release \ -DTensorRT_ROOT=/usr \ -DTensorRT_INCLUDE_DIR=/usr/include/aarch64-linux-gnu \ -DTensorRT_LIB_DIR=/usr/lib/aarch64-linux-gnu \ -DCMAKE_CUDA_ARCHITECTURES=121 cmake --build . -j"$(nproc)" ``` The build produced: * `build/libgrid_sample_3d_plugin.so` Confirm the binary is ARM64: ```bash file build/libgrid_sample_3d_plugin.so ``` Expected output example: * `ELF 64-bit LSB shared object, ARM aarch64, …` --- ### 5) Replace the checkpoint’s x86 plugin {#sec-13b0b27efcd5} ```bash cp /workspace/grid-sample3d-trt-plugin/build/libgrid_sample_3d_plugin.so \ /workspace/ditto-talkinghead/checkpoints/ditto_onnx/libgrid_sample_3d_plugin.so ``` --- ### 6) Verify TensorRT plugin loading {#sec-2cd3892eedff} Loaded the rebuilt plugin directly through the TensorRT Python plugin registry and confirmed it worked. --- ### 7) Rerun ONNX → TensorRT conversion {#sec-489f00d169ce} Running `cvt_onnx_to_trt.py` now succeeded for the entire model, including `warp_network.onnx`. Inference also succeeded. --- ## Current Status {#sec-51c6a9917ae5} - ✅ `GridSample3D` custom TensorRT plugin runs on ARM64 - ✅ `warp_network.onnx` parses correctly - ✅ ONNX → TensorRT engine conversion succeeds - ✅ Ditto TalkingHead inference works on DGX Spark / ARM64 In short, the project is fully ported to the DGX Spark platform. --- ## Troubleshooting Notes {#sec-558215be806b} ### 1) Checkpoint‑bundled `.so` files are platform‑specific {#sec-2186e54e5708} A checkpoint may contain binaries compiled for a different ISA. If you’re on an aarch64 machine, suspect an x86 binary first. --- ### 2) Plugin rebuild may be required after TensorRT upgrades {#sec-c6872cffe35a} When moving to a new major TensorRT version, API/ABI changes can break existing plugins. --- ### 3) Avoid hard‑coding CUDA architectures in `CMakeLists.txt` {#sec-55cea5b42581} Values like `70;80;86;89` will fail on newer GPUs. Use the actual device capability, e.g., `(12,1)` → `121`. ```bash python - <<'PY' import torch print(torch.cuda.get_device_capability()) PY ``` --- ## Environment Summary {#sec-ce6167a42270} - Platform: DGX Spark - Architecture: aarch64 (ARM64) - CUDA: 13.1 - TensorRT: 10.14.1 - Python: 3.12 - GPU capability: 12.1 (CMake CUDA arch = 121) --- **Related posts** - [Messy HunyuanVideo‑Avatar Challenge](/ko/whitedec/2026/3/3/messy-hunyuanvideo-avatar-challenge/)