FareedKhan-dev/kv.md

Created July 4, 2025 15:05

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/FareedKhan-dev/713240d593202327d79e48536c5f27e1.js"></script>
Save FareedKhan-dev/713240d593202327d79e48536c5f27e1 to your computer and use it in GitHub Desktop.

Raw

without caching	with caching
for each step, recompute all previous `K` and `V`	for each step, only compute current `K` and `V`
attention cost per step is quadratic with sequence length	attention cost per step is linear with sequence length (memory grows linearly, but compute/token remains low)