#moe — Blog — ai-muninn

~ / blog / tag / moe

❯ grep -r "#moe" ~/blog

7 matches

datereadtitle
2026-06-0110m
[Benchmark] NVFP4 W4A4 beats FP8 on a DGX Spark MoE: 67 vs 52 tok/s once CUDA graphs fire
#nvfp4 #w4a4 #fp8 #dgx-spark
2026-05-0114m
[vLLM] Nemotron 3 Nano on DGX Spark: 74.75 tok/s NVFP4 — 11.5% Past the Public Baseline
#nemotron-3 #nvfp4 #vllm #dgx-spark
2026-04-2814m
[llm-compressor] Self-Quantizing a 35B Abliterated MoE to FP8 on DGX Spark: 4 OOMs, 3 Prefix Bugs, and Why the First Success Wasn't Actually FP8
#dgx-spark #gb10 #sm121 #llm-compressor
2026-04-136m
[Benchmark] Gemma 4 on DGX Spark — Which Model Should You Pick?
#gemma-4 #dgx-spark #gb10 #benchmark
2026-04-088m
[LLM 101 #2] Dense, MoE, PLE, SSM — Four AI Model Architectures Explained Simply
#dense #moe #ple #ssm
2026-04-059m
Gemma 4 26B-A4B on DGX Spark: 52 tok/s with NVFP4, skip the 31B
#gemma-4 #nvfp4 #vllm #dgx-spark
2026-03-018m
[Benchmark] Pure MoE vs SSM Hybrid: Context Decay and Why It Matters for Agents
#benchmark #ssm #moe #dgx-spark

← back to all posts