~ / blog / series / LLM Deep Dive
❯ ls ~/blog/series/llm-deep-dive
2 posts
- partdatetitle
- 12026-04-15[LLM Deep Dive] What Quantization Algorithms Actually Do: From Q4_K_M to TurboQuant
How does Q4_K_M fit a 14B model into 4 bits without ruining it? Not by 'cutting off 75%' — but through three layers: K-quant super-blocks, TurboQuant random rotation, and a 1-bit JL sign sketch. A mechanism walkthrough without the equations.
- 22026-03-30[Benchmark] TurboQuant on GX10: Is 3-bit KV Cache Compression Actually Lossless?
Real benchmark numbers for Google's TurboQuant on a GB10/SM121 (DGX Spark) — actual compression ratios, Qwen2.5-3B accuracy validation, and why Qwen3.5-35B's hybrid attention architecture makes things complicated.