~ / blog / series / Qwen3.5-122B on DGX Spark
❯ ls ~/blog/series/qwen3.5-122b-on-dgx-spark
2 posts
- partdatetitle
- 12026-03-19[vLLM] Qwen3.5-122B Runs. But at 14 tok/s.
After fixing the four SM121 NVFP4 bugs, Qwen3.5-122B boots cleanly and generates correct output. Then you check the speed. 14 tok/s. No flags to fix it. Here's why — and what to wait for.
- 22026-06-11[Benchmark] Qwen3.5-122B on DGX Spark — 2× faster
Qwen3.5-122B-A10B tops out at 17 tok/s on a 128GB DGX Spark — the GDN wall in vLLM won't budge, not even with a merged perf PR. I swapped vLLM for the Atlas engine on the same abliterated NVFP4 weights and the throughput doubled to 33.9 tok/s (36.5 with MTP, ~2×), uncensored behavior intact. The real lever was outside the quant toolbox.