AMD’s next-generation Polaris GPU architecture unveiled

Over the past month, we’ve detailed the technology initiatives and projects that AMD’s Radeon Technology Group (RTG) revealed at its Sonoma, California retreat in early December. From improved monitors to its GPU Open initiative, AMD wants to introduce new visual standards and capabilities across the market. Now, we’re finally going to talk about the third prong of AMD’s 2016 strategy. Say hello to Polaris — AMD’s official fourth-generation GCN architecture (Tahiti, Hawaii, and Tonga/Fiji were are being defined as generations 1-3).

The new name refers to AMD’s desire to power “every pixel on every device efficiently.” The company notes that “stars are the most efficient photon generators of our universe,” and claims that this is the inspiration behind the new GPU. The major features of fourth-gen GCN are laid out in the chart below.

There are a number of subsidiary technologies included in the Polaris brand-name, alongside the GPU architecture itself. Hopefully this will help avoid confusion over which products belong to which family. When it launched GCN, AMD attempted to avoid various model numbers, instead referring to families of products (Tahiti, Hawaii, Tonga, etc). The problem with this approach is that it didn’t translate well to AMD’s model numbers. The press and public coined its own nomenclature of GCN 1.0, GCN 1.1, and GCN 1.2 to describe AMD’s feature sets, partly because the company’s previous approach wasn’t very clear.

Polaris: Built on 14nm FinFET

We’ve expected this announcement for quite some time, but RTG confirmed it in December — the next-generation Polaris will be built on 14nm FinFET. This offers a number of concrete advances over 28nm planar silicon, including less variation in transistor performance, improved speeds, and better leakage characteristics.

As for how much of an advantage 14nm offers overall, however, AMD’s own graph indicates there’s a definite sweet spot to the new technology. In the graph below, we see that Fmax (maximum frequency) offers dramatically better power consumption at the same clock speed or somewhat improved frequency headroom at the same power consumption.

The frequency gains from 14nm FinFET taper off as the clock rises, which implies that the process is more focused on power efficiency than raw frequency gain. Certainly AMD’s own disclosures at this early date are emphasizing performance-per-watt rather than raw performance.

That doesn’t mean Polaris won’t be substantially faster than Tonga / Fiji — it just means that those gains may be delivered more by architectural enhancements than by clock rate improvements.

AMD hasn’t disclosed much truly deep dive information on Polaris yet, but the company’s preview suggests that the chip’s compute engines, geometry processing, L2 cache, multimedia capabilities, and its display engine have all been overhauled. GCN 4 is still based on GCN, which means it retains the basic organization of its predecessor. Each GCN Compute Unit contains four SIMD units, and each SIMD unit is a 16-wide vector unit.

This isn’t a bad thing, by any means — compute unit performance has always been a strength of the GCN architecture. The other improvements AMD has announced, including a hardware scheduler, primitive discard accelerator, improved triangle throughput, next-generation memory compression (improved from Fiji, which uses GCN 1.2 technology), and the overhauled L2 cache subsystem should all give AMD better graphics performance, while retaining the compute firepower that was already a strength of GCN 1.0 – 1.2.

Memory bandwidth, power consumption, and availability

One thing RTG stressed at its event is that AMD won’t be jumping for HBM across the entire desktop and mobile product stack. Some GPUs will continue to use GDDR5, particularly GPUs that target lower power consumption or mid-range performance.

AMD demoed a small-scale Polaris GPU using GDDR5 memory at the Sonoma event, and put that card up against a GTX 950 of unspecified make/model. Power consumption was shown in real-time using FRAPS while testing Star Wars Battlefront, but the frame rate was capped at 60 FPS. In a test this specific and controlled, it’s difficult to draw much in the way of conclusions about the two processors’ relative performance, but we can at least note that yes — total system power consumption with Polaris was about 60% as high as with the GTX 950.

Raja Koduri confirmed that the GPU we saw tested was a GDDR5 model, but AMD didn’t disclose exactly which chips would use GDDR5 versus HBM, versus HBM2 next year. AMD is also playing coy as far as its foundry partner commitments. We know the small GPU we saw demonstrated was a GlobalFoundries part, but AMD is also bringing chips up at TSMC. One possibility is that Sunnyvale has tapped GlobalFoundries for its lower-power parts, since Samsung doesn’t have much experience building high-power GPUs, and will use TSMC for the higher-power desktop cards.

This isn’t an ideal situation for any company and it’s not clear why AMD would choose to dual source in this fashion. It’s possible that wafer supply agreements with GF mandate moving GPU production over to GlobalFoundries, or it could be related to how much 16nm capacity TSMC expects to have available next year.

As for availability, AMD expects cards to arrive in mid-2016. Which hardware will launch first is an open question — given the company’s focus on efficiency and the importance of the notebook market, AMD could choose to lead with GF hardware first, especially since at least some of these chips will use GDDR5 memory.

As for overall performance, it’s too early to tell. Based on what we’ve seen, we expect Polaris to offer large performance-per-watt increases, with some significant absolute performance jumps thanks to the improved 4th-generation GCN architecture. It’s not clear how much AMD or Nvidia will be able to bank on building bigger GPUs as a means of improving top-end frame rates — remember that both TSMC and GlobalFoundries use a hybrid 20nm / 14nm process for FinFET, which means absolute transistor density improved only modestly from 28nm to 14nm. Going big may not offer the same benefits at 14nm that it did at 28nm, which will put renewed pressure on both GPU manufacturers to make the smartest use of every square millimeter of die space that they can.