DEV CommunityGoogle Cloud’s GKE Inference Gateway cuts LLM response times by 70%Google Cloud Next ’26’s standout announcement wasn’t about new AI models—it was a routing upgrade that slashes LLM time-to-first-token by up to 70% with zero tuning.Apr 28, 2026