Paper 2023/1194

HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU

Xinyi Ji, Nanjing University of Posts and Telecommunications
Jiankuo Dong, Nanjing University of Posts and Telecommunications
Pinchang Zhang, Nanjing University of Posts and Telecommunications
Deng Tonggui, Nanjing University of Posts and Telecommunications
Hua Jiafeng, Xidian University
Fu Xiao, Nanjing University of Posts and Telecommunications
Abstract

CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum cryptography. In this paper, we present a High-performance Implementation of Kyber (named HI-Kyber) on the NVIDIA GPUs, which can increase the key-exchange performance of Kyber to the million-level. Firstly, we propose a lattice-based PQC implementation architecture based on kernel fusion, which can avoid redundant global-memory access operations. Secondly, We optimize and implement the core operations of CRYSTALS-Kyber, including Number Theoretic Transform (NTT), inverse NTT (INTT), pointwise multiplication, etc. Especially for the calculation bottleneck NTT operation, three novel methods are proposed to explore extreme performance: the sliced layer merging (SLM), the sliced depth-first search (SDFS-NTT) and the entire depth-first search (EDFS-NTT), which achieve a speedup of 7.5%, 28.5%, and 41.6% compared to the native implementation. Thirdly, we conduct comprehensive performance experiments with different parallel dimensions based on the above optimization. Finally, our key exchange performance reaches 1,664 kops/s. Specifically, based on the same platform, our HI-Kyber is 3.52$\times$ that of the GPU implementation based on the same instruction set and 1.78$\times$ that of the state-of-the-art one based on AI-accelerated tensor core.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
PQCKyberNTTGPU
Contact author(s)
2022040513 @ njupt edu cn
djiankuo @ njupt edu cn
zpc @ njupt edu cn
19805188788 @ 163 com
1498416954 @ qq com
xiaof @ njupt edu cn
History
2023-08-07: approved
2023-08-06: received
See all versions
Short URL
https://ia.cr/2023/1194
License
Creative Commons Attribution-NonCommercial-ShareAlike
CC BY-NC-SA

BibTeX

@misc{cryptoeprint:2023/1194,
      author = {Xinyi Ji and Jiankuo Dong and Pinchang Zhang and Deng Tonggui and Hua Jiafeng and Fu Xiao},
      title = {{HI}-Kyber: A novel high-performance implementation scheme of Kyber based on {GPU}},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1194},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1194}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.