RPU: The Ring Processing Unit

Deepraj Soni; Negar Neda; Naifeng Zhang; Benedict Reynwar; Homer Gamil; Benjamin Heyman; Mohammed Nabeel Thari Moopan; Ahmad Al Badawi; Yuriy Polyakov; Kellie Canida; Massoud Pedram; Michail Maniatakos; David Bruce Cousins; Franz Franchetti; Matthew French; Andrew Schmidt; Brandon Reagen

Paper 2023/465

RPU: The Ring Processing Unit

Deepraj Soni

, New York University

Negar Neda

, New York University

Naifeng Zhang, Carnegie Mellon University

Benedict Reynwar, USC Information Sciences Institute

Homer Gamil, New York University Abu Dhabi

Benjamin Heyman, New York University

Mohammed Nabeel Thari Moopan, New York University Abu Dhabi

Ahmad Al Badawi, Duality Technology

Yuriy Polyakov, Duality Technologies

Kellie Canida, USC Information Sciences Institute

Massoud Pedram, University of Southern California

Michail Maniatakos, New York University Abu Dhabi

David Bruce Cousins, Duality Technologies

Franz Franchetti, Carnegie Mellon University

Matthew French, USC Information Sciences Institute

Andrew Schmidt, USC Information Sciences Institute

Brandon Reagen, New York University

Abstract

Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512, is developed to meet the needs of ring processing workloads while balancing high-performance and general-purpose programming support. Having an ISA rather than fixed hardware facilitates continued software improvement post-fabrication and the ability to support the evolving workloads. We then propose the ring processing unit (RPU), a high-performance, modular implementation of B512. The RPU has native large word modular arithmetic support, capabilities for very wide parallel processing, and a large capacity high-bandwidth scratchpad to meet the needs of ring processing. We address the challenges of programming the RPU using a newly developed SPIRAL backend. A configurable simulator is built to characterize design tradeoffs and quantify performance. The best performing design was implemented in RTL and used to validate simulator performance. In addition to our characterization, we show that a RPU using 20.5mm2 of GF 12nm can provide a speedup of 1485x over a CPU running a 64k, 128-bit NTT, a core RLWE workload

Metadata

Available format(s): PDF
Category: Applications
Publication info: Published elsewhere. 2023 IEEE International Symposium on Performance Analysis of Systems and Software
Keywords: RPU Hardware accelerator Ring Processing RPU NTT Cryptography Fully Homomorphic Encryption FHE hardware
Contact author(s): dss545 @ nyu edu
negar @ nyu edu
naifengz @ cmu edu
breynwar @ isi edu
og532 @ nyu edu
bch5868 @ nyu edu
mtn2 @ nyu edu
aalbadawi @ dualitytech com
ypolyakov @ dualitytech com
kcanida @ isi edu
pedram @ usc edu
mihalis maniatakos @ nyu edu
dcousins @ dualitytech com
franzf @ ece cmu edu
mfrench @ isi edu
aschmidt @ isi edu
bjr5 @ nyu edu
History: 2023-03-31: approved; 2023-03-30: received; See all versions
Short URL: https://ia.cr/2023/465
License: CC BY

BibTeX

@misc{cryptoeprint:2023/465,
      author = {Deepraj Soni and Negar Neda and Naifeng Zhang and Benedict Reynwar and Homer Gamil and Benjamin Heyman and Mohammed Nabeel Thari Moopan and Ahmad Al Badawi and Yuriy Polyakov and Kellie Canida and Massoud Pedram and Michail Maniatakos and David Bruce Cousins and Franz Franchetti and Matthew French and Andrew Schmidt and Brandon Reagen},
      title = {{RPU}: The Ring Processing Unit},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/465},
      year = {2023},
      url = {https://eprint.iacr.org/2023/465}
}