~ /home/coolthor
ai-muninn
Research notes on AI infrastructure, LLM serving, and autonomous agents. Things that took too long to figure out, written down so you don't have to.
❯ whoami
hardware enthusiast running 120B models at home on DGX Spark
building options trading infrastructure with AI agents
occasionally ships iOS apps
❯ cat ~/blog/concepts
Concepts & Methods
For those who want to understand how AI works
- 2026-04-13[Ask AI Right] Before You Build It, Ask: Does This Already Exist?
Your first question to AI shouldn't be 'help me do X.' It should be 'is there something that already does X?' This article teaches you how to use AI as a research assistant — finding tools, comparing alternatives, and verifying they're still alive.
- 2026-04-11[Ask AI Right] Why AI Feels Useless to You — Answer Machine vs Collaboration Tool
Same AI, same question, different results. The people who find ChatGPT life-changing and the people who think it's useless are doing completely different things — and the difference is a single mindset shift.
- 2026-04-10[Ask AI Right] You Don't Know What You Need — Let AI Find It
Most people don't struggle with using AI — they struggle with knowing what to use it for. This article teaches you a simple method to let AI identify the repetitive parts of your workday you've stopped noticing.
- 2026-04-10[LLM 101] So Many Models — Which One Should You Download?
Gemma, Llama, Qwen, Mistral — the model list is overwhelming. This guide uses car-buying logic to help you pick the right AI model based on size, speed, and quality.
- 2026-04-10[LLM 101] What Is Quantization? Q4, Q8, FP16 Explained
Q4_K_M, Q8_0, FP16 — the same model comes in a dozen versions and the names look like hieroglyphs. This guide explains what quantization actually does, why it doesn't ruin the model, and which level to pick.
❯ cat ~/blog/field-notes
Field Notes
For those who run models and debug the hard way
- 2026-04-13[Claude Code] Build a Self-Auditing Skill That Keeps Your Config Lean
Your CLAUDE.md and MEMORY.md grow silently until they eat 10K+ tokens per turn. I built a /slim skill that lets Claude diagnose and fix its own bloat — here's how.
- 2026-04-13Claude Code Burning Through Tokens? 8 Fixes to Make Sessions Last 10x Longer
You just started using Claude Code and the context window keeps filling up. Here's where the tokens actually go, what you can do about it, and how to make Claude remember things without re-reading everything.
- 2026-04-13[DGX Spark] From Unboxing to Running: Complete Deployment Guide
Everything you need to go from a sealed DGX Spark box to serving your first local LLM. Hardware check, Ollama quickstart, vLLM production setup, model selection, and the 5 gotchas that cost hours.
- 2026-04-13[Benchmark] Gemma 4 Complete Guide on DGX Spark — Which Model Should You Pick?
Gemma 4 E2B / E4B / 26B MoE / 31B Dense benchmarked on DGX Spark, RTX 5090, and MacBook Pro. One table with speed, memory, quantization format. Selection guide included.
- 2026-04-13[AI Agent] Gemma 4 Went from 40 Errors to a 9-Step Bug Fix — by Switching One Thing
A feasibility test: can open-source models run SWE-Bench locally for free? Gemma 4 26B failed on OpenHands (40+ errors) but fixed a test bug in 9 steps on SWE-agent. Same model — the action format was the difference.