cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Supriya Adhikary; Wai Kong Lee; Angshuman Karmakar; Yongwoo Lee; Seong Oun Hwang; Ramachandra Achar

Paper 2024/1729

cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Supriya Adhikary, Indian Institute of Technology Kanpur

Wai Kong Lee, Universiti Tunku Abdul Rahman

Angshuman Karmakar, Indian Institute of Technology Kanpur

Yongwoo Lee, Inha University

Seong Oun Hwang, Gachon University

Ramachandra Achar, Carleton University

Abstract

Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a serious limitation in embedded systems that only have constrained computational resources to support the time-consuming homomorphic encryption. In this paper, we proposed a series of techniques to improve the performance of number theoretic transform targeting homomorphic encryption on a GPU device. Firstly, we proposed to arrange the polynomials in a transposed manner and skip the last two levels of radix-4 number theoretic transformation, allowing us to completely avoid the block synchronization in NTT implementation. This technique improved the performance of homomorphic encryption by 1.37× and 1.34× on RTX 4060 and Jetson Orin Nano respectively, compared to the conventional approach that uses full NTT without skipping any levels. However, such an approach also introduces extra overhead in the subsequent point-wise multiplication, which slows down the homomorphic multiplication. To reduce this negative impact, a fast 16 × 16 point-wise multiplication implementation was proposed, which relies on the heavily optimized Toom-Cook 4-way algorithm. Experimental results show that our proposed homomorphic multiplication can achieve similar latency compared to Jung et al. and Yang et al., which are the best results to date. This shows that the proposed cuTraNTT is able to reduce the latency of homomorphic encryption without sacrificing the performance in homomorphic multiplication.

Note: First submission

Metadata

Available format(s): PDF
Category: Implementation
Publication info: Preprint.
Keywords: Fully homomorphic encryption number theoretic transform graphics processing unit
Contact author(s): adhikaryraja24 @ gmail com
wklee @ utar edu my
angshu99 @ gmail com
yongwooolee @ gmail com
bardic @ naver com
achar @ doe carleton ca
History: 2024-10-25: approved; 2024-10-22: received; See all versions
Short URL: https://ia.cr/2024/1729
License: CC BY-NC

BibTeX

@misc{cryptoeprint:2024/1729,
      author = {Supriya Adhikary and Wai Kong Lee and Angshuman Karmakar and Yongwoo Lee and Seong Oun Hwang and Ramachandra Achar},
      title = {{cuTraNTT}: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for {IoT} Applications},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1729},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1729}
}