
Cerebras runs trillion-parameter AI model 7x faster than GPUs with new chip
A new benchmark shows Cerebras Systems delivering trillion-parameter AI inference at nearly 1,000 tokens per second, outperforming GPU-based clouds by up to 29 times on real-world tasks. The milestone proves wafer-scale chips can handle massive open models where GPUs struggle.