改裝 2080 Ti 22G — Series

~ / blog / series / 改裝 2080 Ti 22G

❯ ls ~/blog/series/改裝-2080-ti-22g

3 posts

partdatereadtitle
12026-06-2212m
[Just for Fun — Advanced] I Scored a 22GB-Modded 2080 Ti for ~$340 All-In — Just Enough to Keep a 27B Agent Running at Home
I dug up a 22GB-modded RTX 2080 Ti for ~$340 all-in (¥2079 sticker + shipping) — just enough to keep a resident 27B agent brain running on the same cheap old desktop. What the mod changes, and the gotchas.
22026-06-2316m
[Just for Fun — Advanced] I Gave Up 100 tok/s for 30 — Fast Isn't the Same as Useful
Picking a local model, I looked at tok/s first too. Gemma 12B does 90-100 and it's great — until you put it on a kanban board, where it finishes the work and just walks away, never marking the card done. A Qwen 27B that's three times slower actually closes the loop. Why throughput is the wrong number for an agent — plus how grep almost lied to me about it.
32026-06-2416m
[Just for Fun — Advanced] I Maxed Context to 256K, It Loaded Fine — Then Crashed in Real Use: A VRAM Detective Story on a 22GB Frankencard
The model card says n_ctx_train=262144. The card has 22GB. The 27B's Q4 weights are only 15.7GB. The math looks obvious: max it to 256K, plenty to spare. -c 262144, launch — loads fine, no error. A few turns of real conversation later: 503, the service restarts itself. No tidy out-of-memory in the log, just a lone 0xc0000409. nvidia-smi: free VRAM down to ~170 MiB. Where did the gigabytes go? This is the hunt: I first blamed context checkpoints, but the llama.cpp source says they live in host RAM — the real VRAM eater is the KV cache; free-VRAM-vs-context is nonlinear, and the one stable sweet spot isn't 256K — it's 128K.

← back to all posts