~/ai-muninn
search
⌘K
blog
github
中
~ / blog
/
tag / mtp
❯
grep -r "#mtp" ~/blog
6 matches
date
read
title
2026-06-25
14m
[Just for Fun — Advanced] Quantizing the Draft Cache Backfired — A Counterintuitive Look at Qwen MTP (f16 ran 34% faster than q4)
#mtp
#speculative-decoding
#local-llm
#qwen3
2026-06-01
10m
[Benchmark] NVFP4 W4A4 beats FP8 on a DGX Spark MoE: 67 vs 52 tok/s once CUDA graphs fire
#nvfp4
#w4a4
#fp8
#dgx-spark
2026-05-14
6m
30 lines of docker for +34% on DGX Spark: huihui Gemma 4 FP8 + vanilla MTP n=1 deployment recipe
#gemma-4
#abliteration
#mtp
#speculative-decoding
2026-05-09
13m
Want MTP speedup on abliterated Gemma 4? Vanilla draft can't track the modified body
#gemma-4
#abliteration
#mtp
#speculative-decoding
2026-05-06
13m
Liftoff: Gemma 4 hits 670 tok/s aggregate on DGX Spark (108 tok/s single-stream)
#gemma-4
#mtp
#speculative-decoding
#vllm
2026-04-28
13m
[llm-compressor] Self-Quantizing a 35B Abliterated MoE to FP8 on DGX Spark: 4 OOMs, 3 Prefix Bugs, and Why the First Success Wasn't Actually FP8
#dgx-spark
#gb10
#sm121
#llm-compressor
← back to all posts