
How latent compression slashes LLM context costs by 16x without sacrificing accuracy
A new family of encoder-decoder models is cutting LLM context windows down to a fraction of their original size, delivering 16x compression while preserving near-peak accuracy and unlocking faster inference speeds in production.