~ /home/coolthor
ai-muninn
Research notes on AI infrastructure, LLM serving, and autonomous agents. Things that took too long to figure out, written down so you don't have to.
❯ whoami
runs all kinds of models at home — LLMs, image gen, video gen, then writes down what he figures out
quantizes models to FP8 / NVFP4 and ships them on Hugging Face — people actually run them
builds options-trading infrastructure with AI agents
had a spec-decode fix merged into vLLM's speculators
occasionally ships iOS apps
❯ cat ~/blog/start-here
Start here
New here? These are a good way in.
- 2026-06-16[Agent 101 #1] AI assistant vs ChatGPT: one answers you, one uses your tools to get things done
You mostly use ChatGPT one question at a time. A self-hosted AI assistant (agent) finishes the job with your own tools, runs on your side, and plugs into the apps you use daily. Lesson one of building your own assistant from zero.
- 2026-06-16[Agent 101 #4] How to install Hermes Agent Desktop: your first AI assistant, no terminal
Install the Hermes Agent desktop app — no terminal. Download it, let it auto-install dependencies, sign in with your ChatGPT account, and your first AI assistant is running in about 15 minutes.
- 2026-06-12[Local LLM #1] My first Q2 model looked broken on a 128GB box — the real culprit was a parser that couldn't read DSML, not the quantization
DeepSeek-V4-Flash is 284B. I got it onto a single 128GB GB10 with antirez's ds4 engine and an asymmetric Q2 GGUF at 15.6 tok/s. The fun part: the broken tool calls weren't the 2-bit quant's fault. The runtime just couldn't parse DSML.
- 2026-06-11[Benchmark #2] Qwen3.5-122B on DGX Spark — 2× faster
Qwen3.5-122B-A10B tops out at 17 tok/s on a 128GB DGX Spark — the GDN wall in vLLM won't budge, not even with a merged perf PR. I swapped vLLM for the Atlas engine on the same abliterated NVFP4 weights and the throughput doubled to 33.9 tok/s (36.5 with MTP, ~2×), uncensored behavior intact. The real lever was outside the quant toolbox.
❯ cat ~/blog/concepts
Concepts & Methods
For those who want to understand how AI works
- 2026-05-23[LLM 101 #7] How to spot AI hallucinations — three red flags before you verify
AI delivers wrong answers in the same confident tone as right ones. Three red flags to catch it early — impossible numbers, suspiciously specific details, answers that shift on a re-ask — plus a case where ChatGPT gave me a +205% P&L that can't exist.
- 2026-04-17[LLM 101 #6] Why Run AI on Your Own Computer? It's Not a Cheaper ChatGPT — It's a Different Tool
Local AI isn't a budget ChatGPT. It's a knowledge extractor, private code assistant, and offline tool. Monthly power cost ~$1.20 vs ChatGPT Plus $20. This guide has a decision table for when to use which.
- 2026-04-16[Ask AI Right #7] What AI Does Poorly — Four Landmines to Know Before Using ChatGPT or Claude in 2026
AI is strong, but four things still trip it up in 2026: hallucinations, stale knowledge, short memory, and privacy defaults. Even Anthropic's own lawyers got caught by the first one.
- 2026-04-14[Ask AI Right #6] The Art of Follow-Up Questions — What to Do When the First Answer Is Too Shallow
The first answer AI gives you is a rough draft, not the final answer. Learn 5 follow-up techniques — adding constraints, asking for comparisons, and letting AI ask YOU questions — to get dramatically better results.
- 2026-04-14[LLM 101 #5] Context Window — How Much Can AI Read at Once?
AI forgets what you said 20 messages ago. It's not broken — its desk is full. This guide explains context windows, why conversations go stale, and how to work around the limit.
❯ cat ~/blog/field-notes
Field Notes
For those who run models and debug the hard way
- 2026-06-21Directional Steering on an Abliterated DeepSeek-V4 (DGX Spark): the same scalpel as abliteration, and why the second cut fights back
ds4 ships directional steering — a runtime activation edit that nudges the model along a chosen direction, and the math is literally abliteration with a continuous, signed scale. I got it running on GB10/CUDA (the tooling looks Metal-only, but the activation dump fires on CUDA too) and pulled a verbosity vector from our abliterated Q2 model. The dial works, but it ignores the textbook: the sweep is non-monotonic and positive scales collapse the output to a four-word fragment. Two cuts from the same scalpel, fighting each other.
- 2026-06-20[Agent 101 #10] Installed it, now what? Give your assistant hands — connect your own tools
Your assistant is installed, but right now it only talks — it's all mouth. This post gives it hands: connect tools so it actually checks your folders, runs your commands, calls services you wrote yourself. The key idea is MCP, the 'universal outlet' standard for tools — plug one in and the assistant can use it. All running on your side, connected to your own stuff.
- 2026-06-19[Agent 101 #9] Swap your assistant's brain for one on your own machine: from cloud ChatGPT to a local model
We used ChatGPT as the assistant's brain. This post does something bolder — swaps that brain from the cloud to a local model running on your own machine (e.g. ds4). The payoff is an autonomous brain: no cloud model provider, your conversations stay on your machine, no usage caps. The honest cost: local brains are usually slower (~10 tok/s on my ds4) and need a capable machine. Swap the brain, keep the body — Hermes doesn't change at all.
- 2026-06-18[LoRA #1] Train your own AI character on an RTX 5090 — one image to a usable character
Train a Wan 2.2 character LoRA on your own RTX 5090 from a single reference image. Then generate the same person from text — new outfits, scenes, art styles, even video. No cloud, no bill.
- 2026-06-17[Agent 101 #8] One person, a whole team of assistants: each with its own desk, brain, and memory
Comfortable with one assistant and want a second and third? Hermes gives each one its own home (config, memory, personality), each able to run a different model and handle different tasks. Plain-language: why split them, how, and the three I actually run. Honest: most people only need one — this is for when you want to tinker.
❯ ls ~/blog/series
Browse by series
Every thread, grouped