#benchmark — Blog

~ / blog / tag / benchmark

❯ grep -r "#benchmark" ~/blog

27 matches

datereadtitle
2026-06-1116m
[Benchmark] Qwen3.5-122B on DGX Spark — 2× faster
#qwen3.5 #dgx-spark #gb10 #gdn
2026-06-0910m
[Just for Fun] Gemma 4 E2B on a GTX 970: the biggest quant runs fastest (47.6 tok/s)
#gemma-4 #quantization #gtx-970 #llama.cpp
2026-06-059m
[Benchmark] NVFP4 Weight-Only Quantization Taxes Chinese ~2x Harder Than English (gemma-4-12B)
#dgx-spark #gb10 #gemma-4 #nvfp4
2026-06-046m
[Benchmark] Gemma 4 12B Omni on DGX Spark: Weight-Only NVFP4 Beats W4A4 (and Keeps Multimodal)
#dgx-spark #gb10 #gemma-4 #nvfp4
2026-06-019m
[Benchmark] NVFP4 shrinks a video model 33% on a DGX Spark — with zero speed gain
#nvfp4 #sulphur-2 #ltx-2.3 #dgx-spark
2026-06-0110m
[Benchmark] NVFP4 W4A4 beats FP8 on a DGX Spark MoE: 67 vs 52 tok/s once CUDA graphs fire
#nvfp4 #w4a4 #fp8 #dgx-spark
2026-05-309m
NVFP4 is 1.5× FP8 on a DGX Spark — but it's compression, not the FP4 cores
#nvfp4 #fp8 #dgx-spark #gb10
2026-05-2110m
Round 2 EAGLE-3 retrain didn't break the ceiling — a 60-hour null-result writeup
#gemma-4 #abliteration #eagle-3 #speculative-decoding
2026-05-1614m
EAGLE-3 fine-tune against an abliterated Gemma 4 body — Round 1 flattens the acceptance curve (plus a measurement lesson)
#gemma-4 #abliteration #eagle-3 #speculative-decoding
2026-05-0914m
Want MTP speedup on abliterated Gemma 4? Vanilla draft can't track the modified body
#gemma-4 #abliteration #mtp #speculative-decoding
2026-05-0615m
Liftoff: Gemma 4 hits 670 tok/s aggregate on DGX Spark (108 tok/s single-stream)
#gemma-4 #mtp #speculative-decoding #vllm
2026-05-0415m
[Field Guide] Z-Image Turbo — does choosing a faster config hurt quality? LPIPS + CLIPScore answer
#z-image #comfyui #nvfp4 #fp8
2026-05-0412m
[Field Guide] Z-Image Turbo — choosing the right config (1.37× faster, 44% less RAM)
#z-image #comfyui #nvfp4 #fp8
2026-05-0114m
[vLLM] Nemotron 3 Nano on DGX Spark: 74.75 tok/s NVFP4 — 11.5% Past the Public Baseline
#nemotron-3 #nvfp4 #vllm #dgx-spark
2026-04-257m
[Benchmark] TMMLU+ Paired Eval: Qwen 3.6 35B Sweeps Gemma 4 26B 51-of-51 on Traditional Chinese
#tmmlu+#traditional-chinese #qwen-3.6 #gemma-4
2026-04-218m
[Benchmark] NVFP4 Is a Trap on GB10: FP8 Wins by 32% (vLLM + SGLang Tested)
#nvfp4 #fp8 #dgx-spark #gb10
2026-04-207m
[Benchmark] Same Scaffold, Three Models: 16% → 38% → 48% on SWE-bench Lite
#swe-bench #gemma-4 #qwen-3.6 #scaffold
2026-04-1712m
[Benchmark] SWE-bench Lite 38.67% with a 26B Local Model — 0.33% from Claude 3.5 Sonnet Scaffolds
#swe-bench #gemma-4 #mini-swe-agent #vllm
2026-04-136m
[Benchmark] Gemma 4 on DGX Spark — Which Model Should You Pick?
#gemma-4 #dgx-spark #gb10 #benchmark
2026-04-089m
[Benchmark] 4 Machines, 4 Models, 1 Answer: Memory Decides Everything
#gemma-4 #rtx-5090 #dgx-spark #gb10
2026-04-078m
[Benchmark] Gemma 4 E2B vs E4B: 81 tok/s vs 52 on Three Machines — Bandwidth Is Everything
#gemma-4 #e2b #e4b #ollama
2026-04-056m
[Benchmark] Gemma 4 31B Dense on DGX Spark: 7 tok/s and the Bandwidth Wall
#gemma-4 #nvfp4 #vllm #dgx-spark
2026-04-058m
[Benchmark] vLLM vs Ollama on the Same Model: Why 30% Faster on GB10
#vllm #ollama #benchmark #dgx-spark
2026-04-059m
Gemma 4 26B-A4B on DGX Spark: 52 tok/s with NVFP4, skip the 31B
#gemma-4 #nvfp4 #vllm #dgx-spark
2026-03-308m
[Benchmark] TurboQuant on GX10: Is 3-bit KV Cache Compression Actually Lossless?
#turboquant #kv-cache #quantization #vllm
2026-03-018m
[Benchmark] Pure MoE vs SSM Hybrid: Context Decay and Why It Matters for Agents
#benchmark #ssm #moe #dgx-spark
2026-02-1911m
[Benchmark] 8 Models on DGX Spark: Finding the Best Stack for AI Agents
#dgx-spark #gb10 #ollama #benchmark

← back to all posts