Pinterest slashes AI costs by 90% with custom vision embeddings

Pinterest’s 620 million monthly users generate billions of interactions daily, making reliance on frontier AI models for image recommendations both unsustainable and prohibitively expensive. To solve this, the company’s engineering team, led by CTO Matt Madrigal, took a radical approach: they dismantled the vision layer of the Qwen3-VL model and rebuilt it using proprietary embeddings. The result was a 90% reduction in AI infrastructure costs while simultaneously boosting recommendation accuracy by 30%.

Madrigal emphasized the strategic shift toward in-house customization of open-source AI models, arguing that data quality often outweighs raw model size for unique use cases. "If you’ve got really unique data that you can then fine-tune an open-source model with, data quality will, frankly, outweigh or overcome model size," he explained during a recent podcast discussion. This philosophy has driven Pinterest’s foundational investment in internal AI development, moving beyond generic solutions to tailor models precisely to their ecosystem.

Rebuilding Qwen3-VL for Visual Discovery

Pinterest has long leveraged open-source models for visual search and discovery, starting with Google’s BERT and later OpenAI’s CLIP. The company’s Pin CLIP model, built on CLIP’s architecture, incorporated proprietary visual embeddings and rich image metadata to enhance recommendation precision. The next evolution came with Navigator 1, Pinterest’s conversational shopping assistant, which initially ran on Qwen3-VL but required significant customization to meet performance demands.

Madrigal’s team executed a high-impact modification: they eliminated Qwen3-VL’s original vision encoder layer and substituted it with Pinterest’s proprietary multimodal embeddings. This architectural shift enabled the precomputation of embeddings for pins and images, stored offline and regularly retrained. The approach drastically reduced runtime latency while improving personalization—critical for a platform where discovery happens in milliseconds.

The custom embeddings also introduced contextual richness around metadata, pins, and images, something off-the-shelf models struggle to capture. Without these embeddings, each image at inference time would require real-time encoding, leading to latency 20 times worse than the optimized system. "If it’s something that’s going to be critical for our end users, that’s going to drive engagement, and that will have to scale to over 600 million monthly active users, we’re going to either probably build it or we’re going to leverage open source and customize the heck out of it," Madrigal noted.

The Taste Graph: Mapping Evolving User Preferences

At the heart of Pinterest’s recommendation engine is the taste graph—a dynamic, real-time representation of individual user preferences that evolves with every interaction. Unlike static social graphs, Pinterest’s model captures nuanced preferences in what Madrigal describes as a "preference graph." It answers not just what users click, but what inspires them, what they’re curious about, and where their interests are headed.

This system bridges the gap between inspiration and intent. While traditional search engines like Google excel when users have a clear objective, Pinterest thrives in the discovery phase—when users are browsing without a fixed goal. The taste graph’s architecture combines graph neural networks with representational learning, continuously updating user embeddings based on new content, activity signals, and evolving tastes.

For example, a user deeply engaged with mid-century modern furniture will receive recommendations that reflect their aesthetic, while another drawn to Nantucket-style interiors will see curated content aligned with that preference. The system doesn’t just surface popular pins; it anticipates intent by understanding the subtle signals embedded in user behavior.

Madrigal highlighted the distinction: "It's not a social graph. It's much more of a preference graph: What's going to inspire you? What are you trying to do next?" This granular understanding enables Pinterest to guide users from passive browsing to active intent—whether that means clicking an ad or making a purchase.

Building for Scale and Innovation

Pinterest’s approach extends beyond technical optimization. The company uses sandboxed environments to encourage creative experimentation while maintaining security and stability. Continuous feedback loops prevent visual AI "slop"—the degradation of output quality over time—and enable rapid iteration based on real user data.

Constant benchmarking remains a cornerstone of their strategy, tracking metrics like user engagement, inference latency, and recommendation relevance. This disciplined cycle of testing and refinement ensures the platform evolves alongside its users’ changing tastes and platform dynamics.

Looking ahead, Pinterest’s blueprint demonstrates how companies can harness open-source AI without being constrained by its limitations. By prioritizing data uniqueness and embedding proprietary context, even frontier models can be reshaped into tailored solutions that deliver both efficiency and performance at scale. As Madrigal put it, the future belongs to teams willing to "customize the heck out of" their AI infrastructure—proving that sometimes, less is more.

AI summary

Pinterest, 620 milyon aktif kullanıcıya görsel öneri sunarken AI maliyetlerini %90’a kadar düşürdü. Qwen3-VL’in görsel katmanının sökülmesi ve özel yerleştirmelerin kullanılmasıyla nasıl başardı?

Pinterest slashes AI costs by 90% with custom vision embeddings

Rebuilding Qwen3-VL for Visual Discovery

The Taste Graph: Mapping Evolving User Preferences

Building for Scale and Innovation

Comments

App Development in 2026: Trends, AI Impact, and Career Paths

New UK tool maps plug-in solar potential for every address

Secluso: Open-source home security with end-to-end encryption