Paper 2022/633

CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes

Shiyu Shen, Hao Yang, Yu Liu, Zhe Liu, and Yunlei Zhao


Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer constant-time low-level arithmetic with minimum instructions and memory usage, as well as performance- and memory-prior configurations, and exploit a parametric and generic design, and offer various trade-offs between resource and efficiency, yielding a solution suitable for accelerating RNS homomorphic multiplication on both high-performance and embedded GPUs. Through this, we can offer more real-time evaluation results and relieve the computational pressure on cloud devices. We deploy our implementations on two GPUs and achieve up to 378.4×, 234.5×, and 287.2× speedup for homomorphic multiplication of BGV, BFV, and CKKS on Tesla V100S, and 8.8×, 9.2×, and 10.3× on Jetson AGX Xavier, respectively.

Available format(s)
Publication info
Preprint. MINOR revision.
Homomorphic encryptionRNS multiplicationNumber Theoretic TransformInternet of ThingsGPU acceleration
Contact author(s)
zksyshen @ gmail com
crypto @ d4rk dev
2022-05-23: received
Short URL
Creative Commons Attribution


      author = {Shiyu Shen and Hao Yang and Yu Liu and Zhe Liu and Yunlei Zhao},
      title = {CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes},
      howpublished = {Cryptology ePrint Archive, Paper 2022/633},
      year = {2022},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.