Qualcomm’s AI breakthrough: HBC architecture redefines near-memory compute

The gap between AI compute performance and memory bandwidth continues to widen, creating what’s known as the memory wall—a bottleneck that stifles efficiency in high-performance AI systems. Traditional solutions like high-bandwidth memory (HBM) offer some relief, but they come with significant cost and power consumption. This week, Qualcomm unveiled a groundbreaking alternative: the High-Bandwidth Compute (HBC) near-memory architecture, designed to break through this barrier by integrating AI accelerators directly beneath DRAM stacks.

Disaggregating AI accelerators to slash latency and power

Qualcomm’s approach rethinks the conventional system-on-chip (SoC) design by physically separating the AI accelerator from the SoC and positioning it underneath an LPDDR DRAM stack. This placement leverages through-silicon vias to establish ultra-fast, low-latency connections between the accelerator and memory. The result? A dramatic reduction in memory congestion and power consumption compared to HBM-based systems.

Tony Pialis, Executive Vice President and General Manager of Qualcomm’s Data Center Business, emphasized the significance of this shift: "We have separated the AI accelerator from the XPU and placed the XPU directly beneath a DRAM stack. This delivers the performance benefits of SRAM with the capacity of stacked memory, eliminating the need for costly silicon interposers and advanced packaging used in HBM solutions."

Qualcomm claims HBC delivers six times higher bandwidth-per-watt than HBM while offering over 200 times the capacity of on-chip SRAM. By standardizing packaging techniques, the company also enables deployment of multiple HBC stacks within a single device, further enhancing performance-per-cost ratios.

How HBC stacks up against near-memory alternatives

Near-memory compute isn’t a new concept—DRAM manufacturers have experimented with variants for years, though none have gained widespread adoption. One recent example is GUC’s DRAM-on-Logic (DoL) technology, which layers one to four DRAM tiers atop logic dies to achieve around 5 terabytes per second of bandwidth. DoL achieves higher performance than some HBM3E subsystems without relying on advanced packaging or HBM3E stacks, mirroring Qualcomm’s cost-conscious strategy.

However, Qualcomm has not disclosed specific performance metrics for HBC, making direct comparisons difficult. The ambiguity extends to the HBC accelerator’s functional scope—it could theoretically serve as a transformer-specific engine, a general tensor core array, or even a preprocessing module for AI inference or training. Without concrete benchmarks, the industry’s verdict on HBC’s practical advantages remains pending.

A roadmap for scalable AI acceleration

Qualcomm’s AI roadmap aligns with its HBC strategy, beginning with the AI200 accelerator expected later this year. The AI200 will leverage LPDDR5X memory, offering 43 terabytes of RAM per rack—a substantial capacity boost. Its successor, the AI250, will integrate first-generation HBC, delivering 18 times the bandwidth of the AI200. The AI300, slated for a later generation, will push performance even further with second-generation HBC, boasting 54 times the bandwidth of the AI300.

This tiered approach suggests Qualcomm is prioritizing incremental scalability, allowing data centers to upgrade throughput without overhauling infrastructure. The focus on standard packaging also hints at broader compatibility with existing server architectures, potentially accelerating adoption.

Breaking the memory wall with smarter design

Memory bottlenecks have long constrained AI innovation, forcing engineers to choose between performance, power, and cost. Qualcomm’s HBC architecture reframes this trade-off by merging compute and memory at the physical layer, eliminating intermediaries like silicon interposers and advanced packaging. If realized at scale, this could democratize high-performance AI by reducing both capital and operational expenses.

The coming months will reveal whether HBC can deliver on its promises—or if it will join the ranks of promising but unrealized near-memory architectures. One thing is certain: the race to redefine AI compute is heating up, and Qualcomm’s bold move places it firmly at the forefront.

AI summary

Qualcomm’un yeni HBC mimarisi, bellek ve hesaplama arasındaki boşluğu kapatarak AI performansını 6 kat artırıyor. HBM’ye alternatif mi? Detaylar burada.

Qualcomm’s AI breakthrough: HBC architecture redefines near-memory compute

Disaggregating AI accelerators to slash latency and power

How HBC stacks up against near-memory alternatives

A roadmap for scalable AI acceleration

Breaking the memory wall with smarter design

Comments

How a 50-foot HDMI cable and Steam Controller 2 replaced Valve’s Steam Machine

Sony’s PlayStation Puga: A Half-Built PS1 Inside a Controller

YMTC SSDs debut in Lenovo laptops amid chip scarcity and U.S. supply risks