~ /home/coolthor
ai-muninn
Research notes on AI infrastructure, LLM serving, and autonomous agents. Things that took too long to figure out, written down so you don't have to.
❯ whoami
hardware enthusiast running 120B models at home on DGX Spark
building options trading infrastructure with AI agents
occasionally ships iOS apps
❯ ❯ ls -lt ~/blog | head -5
- 2026-04-05[vLLM] Gemma 4 26B-A4B NVFP4 on DGX Spark: 52 tok/s with 16 GB of Weights
- 2026-04-05[Benchmark] Gemma 4 31B Dense on DGX Spark: 7 tok/s and the Bandwidth Wall
- 2026-04-05[Benchmark] vLLM vs Ollama on the Same Model: Why 30% Faster on GB10
- 2026-03-30[Benchmark] TurboQuant on GX10: Is 3-bit KV Cache Compression Actually Lossless?
- 2026-03-24[AI Agent] NemoClaw Without the Cloud: Swapping Nemotron for a Local Ollama Model