Paper 2024/1729

cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Supriya Adhikary, Indian Institute of Technology Kanpur
Wai Kong Lee, Universiti Tunku Abdul Rahman
Angshuman Karmakar, Indian Institute of Technology Kanpur
Yongwoo Lee, Inha University
Seong Oun Hwang, Gachon University
Ramachandra Achar, Carleton University
Abstract

Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a serious limitation in embedded systems that only have constrained computational resources to support the time-consuming homomorphic encryption. In this paper, we proposed a series of techniques to improve the performance of number theoretic transform targeting homomorphic encryption on a GPU device. Firstly, we proposed to arrange the polynomials in a transposed manner and skip the last two levels of radix-4 number theoretic transformation, allowing us to completely avoid the block synchronization in NTT implementation. This technique improved the performance of homomorphic encryption by 1.37× and 1.34× on RTX 4060 and Jetson Orin Nano respectively, compared to the conventional approach that uses full NTT without skipping any levels. However, such an approach also introduces extra overhead in the subsequent point-wise multiplication, which slows down the homomorphic multiplication. To reduce this negative impact, a fast 16 × 16 point-wise multiplication implementation was proposed, which relies on the heavily optimized Toom-Cook 4-way algorithm. Experimental results show that our proposed homomorphic multiplication can achieve similar latency compared to Jung et al. and Yang et al., which are the best results to date. This shows that the proposed cuTraNTT is able to reduce the latency of homomorphic encryption without sacrificing the performance in homomorphic multiplication.

Note: First submission

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Fully homomorphic encryptionnumber theoretic transformgraphics processing unit
Contact author(s)
adhikaryraja24 @ gmail com
wklee @ utar edu my
angshu99 @ gmail com
yongwooolee @ gmail com
bardic @ naver com
achar @ doe carleton ca
History
2024-10-25: approved
2024-10-22: received
See all versions
Short URL
https://ia.cr/2024/1729
License
Creative Commons Attribution-NonCommercial
CC BY-NC

BibTeX

@misc{cryptoeprint:2024/1729,
      author = {Supriya Adhikary and Wai Kong Lee and Angshuman Karmakar and Yongwoo Lee and Seong Oun Hwang and Ramachandra Achar},
      title = {{cuTraNTT}: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for {IoT} Applications},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1729},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1729}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.