Is it possible to tune HBM2 for lower clockspeed and timings?

cbn

Lifer
Mar 27, 2009
12,968
221
106
According to the following chart the default speed for HBM2 is 2 Gbps per pin for 256 GB/s bandwith:

hbm-14w.png


But I have been wondering if it is possible to tune the speed per pin down to a much lower range (eg, 400 Mbps to 1 Gbps)? This for use in a large iGPU laptop as a replacement for system RAM. (Think 1 x 8GB HMB2 at 800 Mbps/pin for a lower processor bin and 2 x 8GB HBM2 @ 500 Mbps for a higher processor bin)

1 x 8GB HBM2 @ 800 Mbps/pin = 102 MB/s bandwidth

2 x 8GB HBM2 @ 500 Mbps/pin = 128 MB/s bandwidth
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Assuming these lower HBM2 speeds are possible (with corresponding increased tightness of memory timings) I am wondering about at least two things:

1.) Increased CPU performance compared to a system using DDR4.

2.) Increased performance (relative to a SSD controller using conventional DDR3) for a DRAM-less SSD controller using a portion of the HBM2 via Host memory buffer (from the NVMe 1.2 revision). This especially for a DRAM-less PCIe 3.0 x 4 (or greater) SSD controller integrated within the APU/SoC.

For more about Host memory buffer, see this Article.

The Host Memory Buffer (HMB) feature in NVMe 1.2 allows a drive to request exclusive access to a portion of the host system's RAM for the drive's private use. This kind of capability has been around forever in the GPU space under names like HyperMemory and TurboCache, where it served a similar purpose: to reduce or eliminate the dedicated RAM that needs to be included on peripheral devices.

P.S. Perhaps for a more comprehensive SSD comparison we could expand to 1). DRAM-less integrated PCIe 3.0 x 4 SSD controller using slow speed/tight timing HBM2 via host memory buffer. 2) DRAM-less integrated PCIe 3.0 x 4 SSD controller using conventional DDR4 SO-DIMM(s) via host memory buffer. 3.) DRAM-less non-integrated PCIe 3.0 x 4 SSD controller using slow speed/tight timing HBM2 via host memory buffer. 4.) DRAM-less non-integrated PCIe 3.0 x 4 SSD controller using conventional DDR4 SO-DIMM(s) via host memory buffer. 5.) Conventional PCIe 3.0 x 4 SSD controller with DDR3 dram buffer (think Samsung SM961, etc)
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Assuming these lower HBM2 speeds are possible (with corresponding increased tightness of memory timings) I am wondering about at least two things:

2.) Increased performance (relative to a SSD controller using conventional DDR3) for a DRAM-less SSD controller using a portion of the HBM2 via Host memory buffer (from the NVMe 1.2 revision). This especially for a DRAM-less PCIe 3.0 x 4 (or greater) SSD controller integrated within the APU/SoC.

For more about Host memory buffer, see this Article.

P.S. Perhaps for a more comprehensive SSD comparison we could expand to 1). DRAM-less integrated PCIe 3.0 x 4 SSD controller using slow speed/tight timing HBM2 via host memory buffer. 2) DRAM-less integrated PCIe 3.0 x 4 SSD controller using conventional DDR4 SO-DIMM(s) via host memory buffer. 3.) DRAM-less non-integrated PCIe 3.0 x 4 SSD controller using slow speed/tight timing HBM2 via host memory buffer. 4.) DRAM-less non-integrated PCIe 3.0 x 4 SSD controller using conventional DDR4 SO-DIMM(s) via host memory buffer. 5.) Conventional PCIe 3.0 x 4 SSD controller with DDR3 dram buffer (think Samsung SM961, etc)

Some info on in DRAM-less SSD (using host memory buffer from NVMe spec 1.2):

http://www.anandtech.com/show/10546/toshiba-announces-new-bga-ssds-using-3d-tlc-nand

Toshiba has shared some details about how they plan to make use of HMB and what its impact on performance will be. The BG series uses a DRAM-less SSD controller architecture, but HMB allows the controller to make use of some of the host system's DRAM. The BG series will use host memory to implement a read cache of the drive's NAND mapping tables. This is expected to primarily benefit random access speeds, where a DRAM-less controller would otherwise have to constantly fetch data from flash in order to determine where to direct pending read and write operations. Looking up some of the NAND mapping information from the buffer in the host's DRAM—even with the added latency of fetching it over PCIe—is quicker than performing an extra read from the flash.

Toshiba hasn't provided full performance specs for the new BG series SSDs, but they did supply some benchmark data illustrating the benefit of using HMB. Using only 37MB of host DRAM and testing access speed to a 16GB portion of the SSD, Toshiba measured improvement ranging from 30% for QD1 random reads up to 115% improvement for QD32 random writes.

Table from Anandtech link above called "Performance improvement from enabling HBM:

Randon Read QD1:30%, QD32: 65%
Random Write QD1: 70% QD32: 115%


While it looks like HMB can do a lot to alleviate the worst performance problems of DRAM-less SSD controllers, the caveat is that it requires support from the operating system's NVMe driver. HMB is still an obscure optional feature of NVMe and is not yet supported out of the box by any major operating system, and Toshiba isn't currently planning to provide their own NVMe drivers for OEMs to bundle with systems using BG series SSDs. Thus, it is likely that the first generation of systems that adopt the new BG series SSDs will not be able to take full advantage of their capabilities.

I wonder how much HBM2 would reduce the latency over PCIe bus compared to DDR4? And how would this performance compare not only to DDR4, but also to an SSD using a dedicated DDR3 or LPDDR3 DRAM buffer?
 
Last edited: