A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator

Abstract

Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations, while ensuring the correctness of the results and bringing convincing performance gains. After investigating and analysing the workload of NVIDIA AI accelerator, Tensor Core, we choose to utilize it to accelerate the polynomial multiplication, usually the most time-consuming part in lattice-based cryptography. We take measures to accommodate the matrix-multiply-and-add mode of Tensor Core and make a trade-off between precision and performance, to leverage it as a high-performance NTT box performing NTT/INTT through CUDA C++ WMMA APIs. Meanwhile, we take CRYSTALS-Kyber, the candidate to be standardized by NIST, as a case study on RTX 3080 with the Ampere Tensor Core. The empirical results show that the customized NTT of polynomial vector ($n=256,k=4$) with our NTT box obtains a speedup around 6.47x that of the state-of-the-art implementation on the same GPU platform. Compared with the AVX2 implementation submitted to NIST, our Kyber-1024 can achieve a speedup of 26x, 36x, and 35x for each phase.

Available format(s)
Category
Implementation
Publication info
Published elsewhere. ESORICS 2022
Keywords
Lattice-Based Cryptography Polynomial Multiplication Over Rings NTT AI accelerator Tensor Core Kyber
Contact author(s)
wanlipeng @ iie ac cn
zhengfangyu @ iie ac cn
History
2022-08-16: last of 2 revisions
See all versions
Short URL
https://ia.cr/2022/881

CC BY

BibTeX

@misc{cryptoeprint:2022/881,
author = {Lipeng Wan and Fangyu Zheng and Guang Fan and Rong Wei and Lili Gao and Jiankuo Dong and Jingqiang Lin and Yuewu Wang},
title = {A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator},
howpublished = {Cryptology ePrint Archive, Paper 2022/881},
year = {2022},
note = {\url{https://eprint.iacr.org/2022/881}},
url = {https://eprint.iacr.org/2022/881}
}

Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.