Qwen3.5-122B on DGX Spark — Series

~ / blog / series / Qwen3.5-122B on DGX Spark

❯ ls ~/blog/series/qwen3.5-122b-on-dgx-spark

2 posts

partdatereadtitle
12026-03-197m
[vLLM] Qwen3.5-122B Runs. But at 14 tok/s.
After fixing the four SM121 NVFP4 bugs, Qwen3.5-122B boots cleanly and generates correct output. Then you check the speed. 14 tok/s. No flags to fix it. Here's why — and what to wait for.
22026-06-1115m
[Benchmark] Qwen3.5-122B on DGX Spark — 2× faster
Qwen3.5-122B-A10B tops out at 17 tok/s on a 128GB DGX Spark — the GDN wall in vLLM won't budge, not even with a merged perf PR. I swapped vLLM for the Atlas engine on the same abliterated NVFP4 weights and the throughput doubled to 33.9 tok/s (36.5 with MTP, ~2×), uncensored behavior intact. The real lever was outside the quant toolbox.