~ / blog / series / DeepSeek-V4-Flash on DGX Spark
❯ ls ~/blog/series/deepseek-v4-flash-on-dgx-spark
3 posts
- partdatetitle
- 12026-06-12[Local LLM] Running a 284B DeepSeek-V4-Flash on a 128GB DGX Spark — and blaming the wrong thing for broken tool calls
DeepSeek-V4-Flash is 284B. I got it onto a single 128GB GB10 with antirez's ds4 engine and an asymmetric Q2 GGUF at 15.6 tok/s. The fun part: the broken tool calls weren't the 2-bit quant's fault. The runtime just couldn't parse DSML.
- 22026-06-12[Local LLM] Running a 15 tok/s 284B as your daily agent brain — the settings that make it bearable
A 284B model at 15 tok/s, wired into a daily agent. Two sets of settings make it comfortable — server-side and agent-framework-side. --no-mmap cuts cold start to 57s, the KV disk cache halves prefill, and one missing context_length will crash the whole session.
- 32026-06-12[Local LLM] Weights win: a 284B crushed to 2-bit still beats the small model that fits
DeepSeek-V4-Flash (284B) only fits a 128GB box at asymmetric Q2 (~80GB). Sounds like suicide quantization — but it's surgical: only the layers that barely affect quality get cut. As a daily agent it ran 280 turns with zero degradation. Big enough weights survive 2-bit.