The exact KV cache usage of DeepSeek V4
·
0 reactions
·
0 comments
·
2 views
Figure 1 of DSV4 paper seems to imply that DSV3.2 uses ~50GB at 1m context and DSV4 uses ~5GB: ***Numbers updated with the KV cache breakdown from vllm*** From my own calculations, the correct FP16 KV cache at 1m context should be: Model Params 128k 160k 1m KV% V3/3.1 671B 8.58GiB 10.72GiB 68.63GiB 5.11% V3.2 671B 10.48GiB 13.11GiB 83.88GiB 6.25% V4 Flash 284B 0.84GiB 1.05GiB 6.72GiB 1.18% V4 Pro 1600B 1.20GiB 1.50GiB 9.62GiB 0.3% So while KV cache saving is not 9.5x but 7.879x. It is still very
Original article
Reddit
Anonymous · no account needed