How IndexCache Cuts DeepSeek’s Sparse Attention Costs by 75%
DeepSeek’s sparse attention saves compute by focusing only on key tokens, but its indexer still runs an expensive O(L²) scan at every layer. IndexCache reuses cached index results across adjacent layers, slashing redundant computation without new architecture changes.