Samsung announced today that it is now mass producing second-generation High Bandwidth Memory (HBM2). It’s been roughly six months since AMD launched the first GPUs equipped with HBM, and HBM2 is expected to deliver a significant improvement over and above what we saw with the Fury, Fury X, and Fury Nano.
Like HBM, HBM 2 uses an interposer — an electrical interface that routes the connections from the GPU to the memory. The advantage of HBM2 is that it offers more DRAM per stack and higher throughput. Where AMD’s first-gen HBM hit 1GB per DRAM stack and 512GB/s of memory bandwidth, total, Samsung is already mass producing 8Gb chips. Each stack consists of four 8Gb chips, or 4GB per stack.
Samsung is claiming that it has doubled bandwidth-per-watt efficiency compared with GDDR5, and that claim lines up well with what we’ve observed from comparing the power consumption of HBM with non-HBM graphics cards. This new memory standard brings some significant improvements to power consumption and efficiency, and it saves a great deal of die space on high-end cards.
4GB is the beginning for HBM2, not the end. Samsung notes that it intends to introduce 8GB chips later this year, likely by stacking memory chips 8-Hi, as opposed to current 4-Hi configurations. Both AMD’s Polaris and Nvidia’s Pascal GPUs are expected to use HBM2 for high-end cards, and at least some GDDR5 configurations for lower-end / lower-power hardware.
How HBM2 will change gaming
Silicon Valley prides itself on being disruptive, but true revolutions in computing are few and far between. HBM2, however, just might qualify. At the high end, gamers can look forward to GPUs with 1024GB/s worth of memory bandwidth — double what the Fury X currently offers, and 3x the bandwidth of Nvidia’s GTX Titan X.
HBM2 doesn’t just offer more bandwidth — it significantly reduces the power consumption of the memory subsystem itself. Obviously this is a balancing act, since adding more RAM will still increase total power consumption, but when AMD launched the Fury X, it told the press that moving to HBM reduced power consumption by 40-50W. That’s power that can be redirected to the GPU and spent on increasing fill rate and texturing (to take advantage of the increased RAM bandwidth).
The impact on APUs and SoCs, however, could be even larger. AMD and Intel both offer vastly improved integrated graphics compared with what was on the market five years ago. But main memory remains a critical bottleneck, particularly for AMD, which doesn’t have Intel’s high-end EDRAM caches. Integrate HBM or HBM2 on an APU, and you rewrite the rules.
At 256GB/s of bandwidth and 8GB of RAM per die stack, a single HBM2 link at half-clock speed would still give an AMD APU 128GB/s of memory bandwidth, or roughly 4x the realistic peak you can hit today. At that point, the distinction between current midrange cards and an APU becomes irrelevant (or more appropriately, governed by questions of wattage, mobility, and heat, rather than intrinsic limitations of the platform).
AMD hasn’t announced any HBM2 APUs at this point, but has stated it plans to extend the technology to all facets of its products, which heavily implies that we’ll see Zen APUs with HBM2 at some point in the future. Intel is similarly vague, and may feel that its own EDRAM solution will allow it to hit similar performance targets, but at lower costs.
It’s not clear how low AMD would push HBM2, since the interface does consume more power than LPDDR4. 15W or below SKUs would likely stick to the lowest-possible interface. 35-45W parts and desktop chips, on the other hand, could easily leverage HBM2 to provide more performance than we’ve ever seen in chips in those TDP ranges. The bandwidth HBM2 offers to discrete cards, meanwhile, should help with VR performance — though AMD and Nvidia will both need to improve raw pixel and texture rates as well. If you want more information on HBM, Wide I/O, and Intel’s HMC, we covered all three emerging standards a year ago.