- Mar 27, 2009
- 12,968
- 221
- 106
In a previous thread "Is it possible to tune HBM 2 for lower clockspeeds and timings" I wondered about the possibility of reduced speed HBM2 for a laptop APUs/SoCs:
https://forums.anandtech.com/thread...bm2-for-lower-clockspeed-and-timings.2480064/
However, last month Samsung introduced low cost HBM which actually narrows the width from 1024 bit to 512bit, but compensates with higher speed per pin by 50%:
http://www.anandtech.com/show/10589...as-for-future-memory-tech-ddr5-cheap-hbm-more
So I got to wondering if this proposed lower cost HBM could also be clocked lower to achieve a bandwidth appropriate for a laptop APU?
It would seem looking at the examples below this would be very possible:
1 x 8GB low cost HBM @ 1600 Mbps/pin = 102 MB/s bandwidth
2 x 8GB low cost HBM @ 1000 Mbps/pin = 128 MB/s bandwidth
https://forums.anandtech.com/thread...bm2-for-lower-clockspeed-and-timings.2480064/
However, last month Samsung introduced low cost HBM which actually narrows the width from 1024 bit to 512bit, but compensates with higher speed per pin by 50%:
http://www.anandtech.com/show/10589...as-for-future-memory-tech-ddr5-cheap-hbm-more
Curiously, Samsung is going almost the opposite direction at the high-end of the memory market. In a proposal for low-cost HBM, Samsung laid out a plan for how to bring down the complexity of HBM, and as a result the total cost of the fast-but-expensive memory technology. The low cost proposal essentially trades off some width for frequency; moving a stack from 1024-bits to 512-bits, but increasing the per-pin frequency by 50%. The net result is still less bandwidth than HBM2, but not immensely so.
The big savings here come from the narrower width allowing for simpler memory stacks with fewer TSVs. TSVs are the breakthrough technology that make HBM possible, but they also remain one of the most stubborn components to get correct, as thousands of vias must be wired up inside a single stack. So a die stack with fewer TSVs will be easier to manufacture.
The other interesting aspect of this proposal is that Samsung wants to remove the base logic/buffer die. To be honest I’m not 100% sure how this would work, as one of the fundamental tenets of HBM is that it’s a logic-to-logic (processor to logic die) connection, with the HBM stack’s logic die then coordinating the relatively dumb DRAM layers. Removing the logic die would certainly bring down costs, as it means no longer meshing logic with DRAM on a single package, but it’s not clear where the HBM PHY lies on the cost-reduced memory stack.
Finally, partially as a consequence of the narrower I/O, Samsung wants to try to get away from silicon interposers and use organic interposers instead. Silicon interposers are simple – there’s no logic, just routing – but they’re a big chunk of silicon, and that comes at a cost. If they were able to move to an organic interposer, then the interposer cost would be significantly reduced.
Bear in mind that all of this is just a proposal – Samsung’s slide even notes that they still need client feedback to figure all of this out – but it will be interesting to see how much of this gains traction. At the same time I’m left to wonder what the resulting power cost may be; part of what makes HBM so efficient is that it’s wide and slow. The low-cost proposal here makes HBM a little more GDDR-like, and that could sacrifice some of the efficiency improvements.
So I got to wondering if this proposed lower cost HBM could also be clocked lower to achieve a bandwidth appropriate for a laptop APU?
It would seem looking at the examples below this would be very possible:
1 x 8GB low cost HBM @ 1600 Mbps/pin = 102 MB/s bandwidth
2 x 8GB low cost HBM @ 1000 Mbps/pin = 128 MB/s bandwidth
Last edited:
