The exact KV cache usage of DeepSeek V4

April 26, 2026 at 6:19 AM · 0 reactions · 0 comments · 2 views

Figure 1 of DSV4 paper seems to imply that DSV3.2 uses ~50GB at 1m context and DSV4 uses ~5GB: ***Numbers updated with the KV cache breakdown from vllm*** From my own calculations, the correct FP16 KV cache at 1m context should be: Model Params 128k 160k 1m KV% V3/3.1 671B 8.58GiB 10.72GiB 68.63GiB 5.11% V3.2 671B 10.48GiB 13.11GiB 83.88GiB 6.25% V4 Flash 284B 0.84GiB 1.05GiB 6.72GiB 1.18% V4 Pro 1600B 1.20GiB 1.50GiB 9.62GiB 0.3% So while KV cache saving is not 9.5x but 7.879x. It is still very

Original article

Read full at Reddit →

Anonymous · no account needed

Discussion

0 comments

The exact KV cache usage of DeepSeek V4

Discussion

More from Reddit