Paper 2023/1410
Two Algorithms for Fast GPU Implementation of NTT
Abstract
The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much greater role, as the NTT execution time is mostly memory-bound due to large degree polynomials. In this paper, we propose two algorithms for fast computation of NTT on a class of graphical processing units (GPU) by optimizing the memory access patterns. We present an approach i) to optimize the number of accesses to slow global memory for thread synchronization, and ii) to make better use of spatial locality in global memory accesses. It turns out that by controlling certain parameters in CUDA platform for general-purpose GPU computing (GPGPU) such as kernel count, block size and block shape, we can affect the performance of NTT. To best of our knowledge, this work is unique for it suggests a recipe for selecting optimum CUDA parameters to obtain the best NTT performance for a given polynomial degree. Our implementation results on various GPU devices for all power-of-two polynomial degrees from $2^{12}$ to $2^{28}$ show that our algorithms compare favorably with the other state-of-the-art GPU implementations in the literature with the optimum selection of these three CUDA parameters.
Metadata
- Available format(s)
- Category
- Implementation
- Publication info
- Preprint.
- Keywords
- Number Theoretic TransformGPUHomomorphic Encryptionzk-SNARKHardware AccelerationZero Knowledge Proof
- Contact author(s)
-
alisah @ sabanciuniv edu
erkays @ sabanciuniv edu - History
- 2023-10-06: revised
- 2023-09-19: received
- See all versions
- Short URL
- https://ia.cr/2023/1410
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2023/1410, author = {Ali Şah Özcan and Erkay Savaş}, title = {Two Algorithms for Fast {GPU} Implementation of {NTT}}, howpublished = {Cryptology {ePrint} Archive, Paper 2023/1410}, year = {2023}, url = {https://eprint.iacr.org/2023/1410} }