Google NYC hosts hands-on AI infrastructure workshop for scaling LLMs

The challenge of scaling large language models (LLMs) isn’t just about writing better algorithms—it’s about building infrastructure that can keep up. Google is addressing this head-on with an upcoming workshop in New York City designed for teams grappling with the real-world demands of high-performance computing (HPC) and inference workflows.

On Thursday, May 28, Google will host Scaling Intelligence: Accelerating HPC and Inference Workflows at its NYC office in Chelsea. The event targets engineers, architects, and technical leaders who need to bridge the gap between ambitious AI models and the practical constraints of production environments. The focus? Moving beyond theoretical discussions to tackle the actual plumbing required for scalable, low-latency inference at enterprise level.

What to expect: A deep dive into AI infrastructure

This isn’t a generic tech talk—it’s a technical deep-dive with actionable takeaways. Attendees will explore the architectural blueprints for next-generation compute setups tailored for concurrent, high-throughput inference workloads. The session will dissect the hardware and software stack, highlighting how to leverage cutting-edge tools for maximum efficiency.

Key topics include:

Optimizing workloads using Google Cloud’s latest G4 VMs, powered by NVIDIA’s RTX Pro 6000 Blackwell architecture. These setups are engineered to push the boundaries of performance while maintaining stability.
Deploying and fine-tuning state-of-the-art open-source models like Google’s Gemma and Meta’s Llama 3 in live, guided labs.
TensorRT integration, a framework designed to squeeze every ounce of performance from NVIDIA GPUs, ensuring models run faster without sacrificing accuracy.

Hands-on labs: From theory to implementation

Unlike many conferences, this workshop emphasizes practical learning. Participants are encouraged to bring their laptops and dive straight into the action. Under the guidance of Google Cloud and NVIDIA AI experts, teams will walk through deploying models, configuring inference pipelines, and troubleshooting real-world bottlenecks.

The labs are structured to simulate production environments, where latency, throughput, and security aren’t just theoretical concerns—they’re make-or-break factors. Engineers will leave with tangible skills to apply immediately to their own projects.

Why cross-functional teams should attend

Infrastructure scaling isn’t a solo effort. Google’s workshop is designed with collaboration in mind, encouraging teams to attend together. The ideal group includes:

AI/ML architects and engineers who design the models and workflows.
Platform engineers and DevSecOps specialists responsible for deployment and security.
IT and infrastructure leaders who make the high-stakes decisions on budgets, scalability, and uptime.

By aligning these roles in a single room, the workshop aims to break down silos that often slow down AI adoption. The goal? To ensure data scientists aren’t waiting weeks for infrastructure teams to catch up—and vice versa.

Logistics: Limited seats, high-impact outcomes

The event will take place at Google’s NYC headquarters in Chelsea, located at 111 8th Avenue. Doors open at 12:00 PM, with the session running until 4:00 PM. A networking reception follows, offering an opportunity to connect with peers and Google’s technical experts.

Spaces are strictly limited to ensure quality coaching and meaningful architectural reviews during the labs. Teams are advised to register early to secure their spot.

A step toward scalable, production-ready AI

For organizations pushing the limits of generative AI, infrastructure bottlenecks can mean the difference between a proof-of-concept and a seamless user experience. This workshop isn’t just about learning—it’s about equipping teams with the tools to turn ambitious models into reliable, scalable systems.

If your team is ready to move beyond the limitations of current setups and explore what’s possible with next-gen AI infrastructure, this is your chance to get hands-on guidance from the experts driving the technology forward.

AI summary

Google NYC'de 28 Mayıs'ta yapılacak olan Scaling Intelligence Workshop'a katılın ve yapay zeka altyapınızı ölçeklendirme ve HPC iş akışlarınızı hızlandırma konularında uzmanlardan öğrenin

Google NYC hosts hands-on AI infrastructure workshop for scaling LLMs

What to expect: A deep dive into AI infrastructure

Hands-on labs: From theory to implementation

Why cross-functional teams should attend

Logistics: Limited seats, high-impact outcomes

A step toward scalable, production-ready AI

Comments

How to Build a Daily Puzzle Site: Key Tech Stack Insights

Build cleaner TypeScript logic with method chaining pattern matching

How AI Transforms Incident Response with Smart Root-Cause Analysis