LLM Deep Dive — Series

~ / blog / series / LLM Deep Dive

❯ ls ~/blog/series/llm-deep-dive

2 posts

partdatereadtitle
12026-04-1511m
[LLM Deep Dive] What Quantization Algorithms Actually Do: From Q4_K_M to TurboQuant
How does Q4_K_M fit a 14B model into 4 bits without ruining it? Not by 'cutting off 75%' — but through three layers: K-quant super-blocks, TurboQuant random rotation, and a 1-bit JL sign sketch. A mechanism walkthrough without the equations.
22026-03-308m
[Benchmark] TurboQuant on GX10: Is 3-bit KV Cache Compression Actually Lossless?
Real benchmark numbers for Google's TurboQuant on a GB10/SM121 (DGX Spark) — actual compression ratios, Qwen2.5-3B accuracy validation, and why Qwen3.5-35B's hybrid attention architecture makes things complicated.