Paper 2022/881

A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator

Lipeng Wan, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Fangyu Zheng, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Guang Fan, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Rong Wei, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Lili Gao, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Jiankuo Dong, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
Jingqiang Lin, School of Cyber Security, University of Science and Technology of China, Hefei, China
Yuewu Wang, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Data Assurance and Communication Security Research Center, Chinese Academy of Sciences, Beijing, China
Abstract

Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations, while ensuring the correctness of the results and bringing convincing performance gains. After investigating and analysing the workload of NVIDIA AI accelerator, Tensor Core, we choose to utilize it to accelerate the polynomial multiplication, usually the most time-consuming part in lattice-based cryptography. We take measures to accommodate the matrix-multiply-and-add mode of Tensor Core and make a trade-off between precision and performance, to leverage it as a high-performance NTT box performing NTT/INTT through CUDA C++ WMMA APIs. Meanwhile, we take CRYSTALS-Kyber, the candidate to be standardized by NIST, as a case study on RTX 3080 with the Ampere Tensor Core. The empirical results show that the customized NTT of polynomial vector ($n=256,k=4$) with our NTT box obtains a speedup around 6.47x that of the state-of-the-art implementation on the same GPU platform. Compared with the AVX2 implementation submitted to NIST, our Kyber-1024 can achieve a speedup of 26x, 36x, and 35x for each phase.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Published elsewhere. ESORICS 2022
Keywords
Lattice-Based Cryptography Polynomial Multiplication Over Rings NTT AI accelerator Tensor Core Kyber
Contact author(s)
wanlipeng @ iie ac cn
zhengfangyu @ iie ac cn
History
2022-08-16: last of 2 revisions
2022-07-06: received
See all versions
Short URL
https://ia.cr/2022/881
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2022/881,
      author = {Lipeng Wan and Fangyu Zheng and Guang Fan and Rong Wei and Lili Gao and Jiankuo Dong and Jingqiang Lin and Yuewu Wang},
      title = {A Novel High-performance Implementation of {CRYSTALS}-Kyber with {AI} Accelerator},
      howpublished = {Cryptology {ePrint} Archive, Paper 2022/881},
      year = {2022},
      url = {https://eprint.iacr.org/2022/881}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.