~ /home/coolthor

ai-muninn

Research notes on AI infrastructure, LLM serving, and autonomous agents. Things that took too long to figure out, written down so you don't have to.

whoami

runs all kinds of models at home — LLMs, image gen, video gen, then writes down what he figures out

quantizes models to FP8 / NVFP4 and ships them on Hugging Face — people actually run them

builds options-trading infrastructure with AI agents

had a spec-decode fix merged into vLLM's speculators

occasionally ships iOS apps

cat ~/blog/start-here

Start here

New here? These are a good way in.

cat ~/blog/concepts

Concepts & Methods

For those who want to understand how AI works

cat ~/blog/field-notes

Field Notes

For those who run models and debug the hard way

  • 2026-07-02
    [Just for Fun — Advanced #7] Progressive Streaming on a Slow Model Got My Bot Rate-Limited by Telegram

    To ease the wait on a pokey local agent, I turned on Telegram streaming — which, the way this bot did it, means rewriting the same message every fraction of a second. On a 14 tok/s brain, a single 175-second reply works out to an estimated couple hundred edit requests, which slammed into Telegram's flood control and got the whole bot benched for four minutes — final answer included. The short, ugly lesson: slow models should not fake streaming with edits. Send the finished answer once. Live logs inside.

  • 2026-07-01
    [Agent 101 #15] Hermes /learn: I had a local 27B write its own reusable skill

    Hermes has a /learn command that turns 'something you just did' into a reusable skill — a SKILL.md. I wired it into my own fleet: one Kanban card, a local 27B running on a modded 2080 Ti, and about 3 minutes later it handed back a clean, spec-compliant skill — plus two implementation details the docs don't spell out (slash command vs. dispatch, and where skills actually live). A plain-language walkthrough of what /learn does, how to use it, and where its limits are.

  • 2026-06-29
    [Agent 101 #14] One spec, three assistants, three Tetris games: a Hermes Kanban dispatch test

    After raising a fleet of assistants, I gave them the same one-line 'make a Tetris game' spec — no details at all — one card each, and let them each write a web Tetris in a single shot. I touched zero lines of game code; I only published the result. The surprise: from that one line, the Hermes harness plus a local model I tuned myself (on a modded 2080 Ti) filled in things I never asked for — a ghost piece and wall-kick — in one shot. You can play all three.

  • 2026-06-29
    [Troubleshooting #1] HuggingFace download stuck at 0 bytes on Windows — Xet, Python 3.13, ai-toolkit

    Training with ai-toolkit on Windows + RTX 5090 hit three walls before it even started: Python 3.13 dependency hell, a HuggingFace download frozen at 0 bytes, and ssh killing the process. Each one's error pointed the wrong way — diagnosis and fix for all three.

  • 2026-06-28
    [LoRA #2] The character-LoRA control panel: dialing in style, realism, and identity

    Once your character LoRA is trained, how do you control it? Why lightning flattens style, when to spend full steps, how to stack a style LoRA, and why the trigger word alone won't hold the look.

ls ~/blog/series

Browse by series

Every thread, grouped

116 posts total · view all posts →

Don't miss the next one

Subscribe, and you won't.

One-click unsubscribe anytime. · buy me a coffee