Paper 2023/1410

Two Algorithms for Fast GPU Implementation of NTT

Ali Şah Özcan, Sabanci University
Erkay Savaş, Sabanci University
Abstract

The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much greater role, as the NTT execution time is mostly memory-bound due to large degree polynomials. In this paper, we propose two algorithms for fast computation of NTT on a class of graphical processing units (GPU) by optimizing the memory access patterns. We present an approach i) to optimize the number of accesses to slow global memory for thread synchronization, and ii) to make better use of spatial locality in global memory accesses. It turns out that by controlling certain parameters in CUDA platform for general-purpose GPU computing (GPGPU) such as kernel count, block size and block shape, we can affect the performance of NTT. To best of our knowledge, this work is unique for it suggests a recipe for selecting optimum CUDA parameters to obtain the best NTT performance for a given polynomial degree. Our implementation results on various GPU devices for all power-of-two polynomial degrees from $2^{12}$ to $2^{28}$ show that our algorithms compare favorably with the other state-of-the-art GPU implementations in the literature with the optimum selection of these three CUDA parameters.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Number Theoretic TransformGPUHomomorphic Encryptionzk-SNARKHardware AccelerationZero Knowledge Proof
Contact author(s)
alisah @ sabanciuniv edu
erkays @ sabanciuniv edu
History
2023-10-06: revised
2023-09-19: received
See all versions
Short URL
https://ia.cr/2023/1410
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2023/1410,
      author = {Ali Şah Özcan and Erkay Savaş},
      title = {Two Algorithms for Fast {GPU} Implementation of {NTT}},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1410},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1410}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.