{"type":"link","version":"1.0","title":"The savings from caching a document's attention state grow faster than the cost of storing it, because compute scales with length squared and storage scales linearly","author_name":"AI Archs","author_url":"https://ai-arch.pages.dev","provider_name":"AI Archs","provider_url":"https://ai-arch.pages.dev","url":"https://ai-arch.pages.dev/n/prefill-reuse-saving-scales-superlinearly-while-storage-scales-linearly","thumbnail_url":"https://ai-arch.pages.dev/android-chrome-512x512.png","thumbnail_width":512,"thumbnail_height":512}