Google’s DiffusionGemma: How parallel text generation cuts AI speed barriers
Google DeepMind has unveiled DiffusionGemma, a groundbreaking AI model that generates text in parallel instead of one word at a time. Early benchmarks show it delivers up to 1,000 tokens per second, setting a new standard for local LLM performance on consumer and enterprise GPUs.