Samsung Demos In-Memory Processing
October 12, 2021 - Author: Bob Wheeler
As High Bandwidth Memory (HBM) proliferates in high-performance computing, Samsung is turning the tables by demonstrating a processor-in-memory (PIM) version of its popular Aquabolt HBM2. At Hot Chips, it disclosed HBM-PIM architecture details as well as early performance results using an undisclosed GPU and a Xilinx Alveo card, both populated with HBM-PIM stacks dubbed Aquabolt-XL. It also presented simulations for a conceptual LPDDR5-PIM, which would address low-power applications.
The standard Aquabolt module comprises eight DRAM die stacked on a buffer die. To create Aquabolt-XL, Samsung replaced the bottom four die with special PIM-DRAM, leaving the rest of the stack unmodified. The PIM-DRAM inserts 32 processing units between even/odd bank pairs, enabling bank-level parallelism. Replacing only half the die reduced risk and allows performance comparisons between HBM and HBM-PIM in a single stack. Aquabolt-XL preserves HBM2 timing so it can drop into existing designs.
To evaluate Aquabolt-XL’s performance benefits, Samsung tested BLAS microbenchmarks as well as full neural-network models against GPUs with standard HBM2 as a baseline. Unsurprisingly, workloads that caused high last-level-cache miss rates benefited most from moving operations to PIM. It also worked with Xilinx to build a prototype Virtex UltraScale+ FPGA with two Aquabolt-XL stacks. The results show that PIM is no panacea, but it can greatly accelerate memory-bound workloads. Samsung isn’t the first to demonstrate PIM-DRAM, but as the leading DRAM vendor, its market influence is undeniable.
Subscribers can view the full article in the Microprocessor Report.