#nvfp4 — Blog — ai-muninn

~ / blog / tag / nvfp4

❯ grep -r "#nvfp4" ~/blog

22 matches

datereadtitle
2026-07-078m
DGX Spark in 2026: What Still Works, What Broke, and What I'd Run Today
#dgx-spark #gb10 #gemma-4 #vllm
2026-06-1310m
[vLLM] DiffusionGemma 26B NVFP4 on a DGX Spark: 158 tok/s, and why diffusion tok/s lies
#dgx-spark #gb10 #diffusiongemma #diffusion-llm
2026-06-1116m
[Benchmark] Qwen3.5-122B on DGX Spark — 2× faster
#qwen3.5 #dgx-spark #gb10 #gdn
2026-06-059m
[Benchmark] NVFP4 Weight-Only Quantization Taxes Chinese ~2x Harder Than English (gemma-4-12B)
#dgx-spark #gb10 #gemma-4 #nvfp4
2026-06-046m
[Benchmark] Gemma 4 12B Omni on DGX Spark: Weight-Only NVFP4 Beats W4A4 (and Keeps Multimodal)
#dgx-spark #gb10 #gemma-4 #nvfp4
2026-06-019m
[Benchmark] NVFP4 shrinks a video model 33% on a DGX Spark — with zero speed gain
#nvfp4 #sulphur-2 #ltx-2.3 #dgx-spark
2026-06-0110m
[Benchmark] NVFP4 W4A4 beats FP8 on a DGX Spark MoE: 67 vs 52 tok/s once CUDA graphs fire
#nvfp4 #w4a4 #fp8 #dgx-spark
2026-05-309m
NVFP4 is 1.5× FP8 on a DGX Spark — but it's compression, not the FP4 cores
#nvfp4 #fp8 #dgx-spark #gb10
2026-05-0615m
Liftoff: Gemma 4 hits 670 tok/s aggregate on DGX Spark (108 tok/s single-stream)
#gemma-4 #mtp #speculative-decoding #vllm
2026-05-0415m
[Field Guide] Z-Image Turbo — does choosing a faster config hurt quality? LPIPS + CLIPScore answer
#z-image #comfyui #nvfp4 #fp8
2026-05-0412m
[Field Guide] Z-Image Turbo — choosing the right config (1.37× faster, 44% less RAM)
#z-image #comfyui #nvfp4 #fp8
2026-05-0114m
[vLLM] Nemotron 3 Nano on DGX Spark: 74.75 tok/s NVFP4 — 11.5% Past the Public Baseline
#nemotron-3 #nvfp4 #vllm #dgx-spark
2026-04-2214m
[Hands-On] Making NVFP4 17% Faster on GB10 with a Triton FP8 Bypass
#nvfp4 #fp8 #triton #dgx-spark
2026-04-218m
[Benchmark] NVFP4 Is a Trap on GB10: FP8 Wins by 32% (vLLM + SGLang Tested)
#nvfp4 #fp8 #dgx-spark #gb10
2026-04-136m
[Benchmark] Gemma 4 on DGX Spark — Which Model Should You Pick?
#gemma-4 #dgx-spark #gb10 #benchmark
2026-04-079m
[Benchmark] From 19 to 50 tok/s: We Quantized Gemma 4 E4B to NVFP4 Before Anyone Else
#gemma-4 #e4b #nvfp4 #fp8
2026-04-056m
[Benchmark] Gemma 4 31B Dense on DGX Spark: 7 tok/s and the Bandwidth Wall
#gemma-4 #nvfp4 #vllm #dgx-spark
2026-04-058m
[Benchmark] vLLM vs Ollama on the Same Model: Why 30% Faster on GB10
#vllm #ollama #benchmark #dgx-spark
2026-04-059m
Gemma 4 26B-A4B on DGX Spark: 52 tok/s with NVFP4, skip the 31B
#gemma-4 #nvfp4 #vllm #dgx-spark
2026-03-197m
[vLLM] Qwen3.5-122B Runs. But at 14 tok/s.
#dgx-spark #sm121 #qwen3.5-122b #vllm
2026-03-1711m
[vLLM] Why Your DGX Spark Only Says "!!!!!": Debugging NVFP4 on SM121
#dgx-spark #sm121 #vllm #nvfp4
2026-03-1310m
[vLLM] Nemotron-3-Super-120B on a Single GB10: Full Day Debug Log
#dgx-spark #gb10 #sm121 #nemotron

← back to all posts