The challenge of scaling large language models (LLMs) isn’t just about writing better algorithms—it’s about building infrastructure that can keep up. Google is addressing this head-on with an upcoming workshop in New York City designed for teams grappling with the real-world demands of high-performance computing (HPC) and inference workflows.
On Thursday, May 28, Google will host Scaling Intelligence: Accelerating HPC and Inference Workflows at its NYC office in Chelsea. The event targets engineers, architects, and technical leaders who need to bridge the gap between ambitious AI models and the practical constraints of production environments. The focus? Moving beyond theoretical discussions to tackle the actual plumbing required for scalable, low-latency inference at enterprise level.
What to expect: A deep dive into AI infrastructure
This isn’t a generic tech talk—it’s a technical deep-dive with actionable takeaways. Attendees will explore the architectural blueprints for next-generation compute setups tailored for concurrent, high-throughput inference workloads. The session will dissect the hardware and software stack, highlighting how to leverage cutting-edge tools for maximum efficiency.
Key topics include:
- Optimizing workloads using Google Cloud’s latest G4 VMs, powered by NVIDIA’s RTX Pro 6000 Blackwell architecture. These setups are engineered to push the boundaries of performance while maintaining stability.
- Deploying and fine-tuning state-of-the-art open-source models like Google’s Gemma and Meta’s Llama 3 in live, guided labs.
- TensorRT integration, a framework designed to squeeze every ounce of performance from NVIDIA GPUs, ensuring models run faster without sacrificing accuracy.
Hands-on labs: From theory to implementation
Unlike many conferences, this workshop emphasizes practical learning. Participants are encouraged to bring their laptops and dive straight into the action. Under the guidance of Google Cloud and NVIDIA AI experts, teams will walk through deploying models, configuring inference pipelines, and troubleshooting real-world bottlenecks.
The labs are structured to simulate production environments, where latency, throughput, and security aren’t just theoretical concerns—they’re make-or-break factors. Engineers will leave with tangible skills to apply immediately to their own projects.
Why cross-functional teams should attend
Infrastructure scaling isn’t a solo effort. Google’s workshop is designed with collaboration in mind, encouraging teams to attend together. The ideal group includes:
- AI/ML architects and engineers who design the models and workflows.
- Platform engineers and DevSecOps specialists responsible for deployment and security.
- IT and infrastructure leaders who make the high-stakes decisions on budgets, scalability, and uptime.
By aligning these roles in a single room, the workshop aims to break down silos that often slow down AI adoption. The goal? To ensure data scientists aren’t waiting weeks for infrastructure teams to catch up—and vice versa.
Logistics: Limited seats, high-impact outcomes
The event will take place at Google’s NYC headquarters in Chelsea, located at 111 8th Avenue. Doors open at 12:00 PM, with the session running until 4:00 PM. A networking reception follows, offering an opportunity to connect with peers and Google’s technical experts.
Spaces are strictly limited to ensure quality coaching and meaningful architectural reviews during the labs. Teams are advised to register early to secure their spot.
A step toward scalable, production-ready AI
For organizations pushing the limits of generative AI, infrastructure bottlenecks can mean the difference between a proof-of-concept and a seamless user experience. This workshop isn’t just about learning—it’s about equipping teams with the tools to turn ambitious models into reliable, scalable systems.
If your team is ready to move beyond the limitations of current setups and explore what’s possible with next-gen AI infrastructure, this is your chance to get hands-on guidance from the experts driving the technology forward.
AI summary
Google NYC'de 28 Mayıs'ta yapılacak olan Scaling Intelligence Workshop'a katılın ve yapay zeka altyapınızı ölçeklendirme ve HPC iş akışlarınızı hızlandırma konularında uzmanlardan öğrenin