Skip to content

Instantly share code, notes, and snippets.

@FareedKhan-dev
Created July 4, 2025 15:05
Show Gist options
  • Save FareedKhan-dev/713240d593202327d79e48536c5f27e1 to your computer and use it in GitHub Desktop.
Save FareedKhan-dev/713240d593202327d79e48536c5f27e1 to your computer and use it in GitHub Desktop.
without caching with caching
for each step, recompute all previous K and V for each step, only compute current K and V
attention cost per step is quadratic with sequence length attention cost per step is linear with sequence length (memory grows linearly, but compute/token remains low)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment