{"type":"link","version":"1.0","title":"Storing keys before applying rotary position encoding, then rotating per-request inside attention, makes one cache block reusable at any offset for ~5% kernel overhead","author_name":"AI Archs","author_url":"https://ai-arch.pages.dev","provider_name":"AI Archs","provider_url":"https://ai-arch.pages.dev","url":"https://ai-arch.pages.dev/n/store-unrotated-keys-and-apply-positional-encoding-inside-attention","thumbnail_url":"https://ai-arch.pages.dev/android-chrome-512x512.png","thumbnail_width":512,"thumbnail_height":512}