#mtp — Blog — ai-muninn

~ / blog / tag / mtp

❯ grep -r "#mtp" ~/blog

6 matches

datereadtitle
2026-06-2514m
[Just for Fun — Advanced] Quantizing the Draft Cache Backfired — A Counterintuitive Look at Qwen MTP (f16 ran 34% faster than q4)
#mtp #speculative-decoding #local-llm #qwen3
2026-06-0110m
[Benchmark] NVFP4 W4A4 beats FP8 on a DGX Spark MoE: 67 vs 52 tok/s once CUDA graphs fire
#nvfp4 #w4a4 #fp8 #dgx-spark
2026-05-146m
30 lines of docker for +34% on DGX Spark: huihui Gemma 4 FP8 + vanilla MTP n=1 deployment recipe
#gemma-4 #abliteration #mtp #speculative-decoding
2026-05-0913m
Want MTP speedup on abliterated Gemma 4? Vanilla draft can't track the modified body
#gemma-4 #abliteration #mtp #speculative-decoding
2026-05-0613m
Liftoff: Gemma 4 hits 670 tok/s aggregate on DGX Spark (108 tok/s single-stream)
#gemma-4 #mtp #speculative-decoding #vllm
2026-04-2813m
[llm-compressor] Self-Quantizing a 35B Abliterated MoE to FP8 on DGX Spark: 4 OOMs, 3 Prefix Bugs, and Why the First Success Wasn't Actually FP8
#dgx-spark #gb10 #sm121 #llm-compressor

← back to all posts