2026-03-10
My Experience Installing FlashAttention 2 for Model Inference on DGX Spark
A review of installing FlashAttention 2 on DGX Spark to improve model inference speed and GPU memory usage. This post shares the installation challenges, including source compilation on aarch64, and the actual performance and memory improvements observed after setup.