~ / blog / series / Qwen3.5-122B on DGX Spark

ls ~/blog/series/qwen3.5-122b-on-dgx-spark

2 posts

  • partdatetitle
  • 12026-03-19
    [vLLM] Qwen3.5-122B Runs. But at 14 tok/s.

    After fixing the four SM121 NVFP4 bugs, Qwen3.5-122B boots cleanly and generates correct output. Then you check the speed. 14 tok/s. No flags to fix it. Here's why — and what to wait for.

  • 22026-06-11
    [Benchmark] Qwen3.5-122B on DGX Spark — 2× faster

    Qwen3.5-122B-A10B tops out at 17 tok/s on a 128GB DGX Spark — the GDN wall in vLLM won't budge, not even with a merged perf PR. I swapped vLLM for the Atlas engine on the same abliterated NVFP4 weights and the throughput doubled to 33.9 tok/s (36.5 with MTP, ~2×), uncensored behavior intact. The real lever was outside the quant toolbox.