Paper 2025/748

Symphony of Speeds: Harmonizing Classic McEliece Cryptography with GPU Innovation

Wen Wu, Nanjing University of Posts and Telecommunications
Jiankuo Dong, Nanjing University of Posts and Telecommunications
Zhen Xu, Nanjing University of Posts and Telecommunications
Zhenjiang Dong, Nanjing University of Posts and Telecommunications
Dung Duong, University of Wollongong
Fu Xiao, Nanjing University of Posts and Telecommunications
Jingqiang Lin, University of Science and Technology of China
Abstract

The Classic McEliece key encapsulation mechanism (KEM), a candidate in the fourth-round post-quantum cryptography (PQC) standardization process by the National Institute of Standards and Technology (NIST), stands out for its conservative design and robust security guarantees. Leveraging the code-based Niederreiter cryptosystem, Classic McEliece delivers high-performance encapsulation and decapsulation, making it well-suited for various applications. However, there has not been a systematic implementation of Classic McEliece on GPU platforms. This paper presents the first high-performance implementation of Classic McEliece on NVIDIA GPUs. Firstly, we present the first GPU-based implementation of Classic McEliece, utilizing a ``CPU-GPU'' heterogeneous approach and a kernel fusion strategy. We significantly reduce global memory accesses, optimizing memory access patterns. This results in encapsulation and decapsulation performance of 28,628,195 ops/s and 3,051,701 ops/s, respectively, for McEliece348864. Secondly, core operations like Additive Fast Fourier Transforms (AFFT), and Transpose AFFT (TAFFT) are optimized. We introduce the concept of the (T)AFFT stepping chain and propose two universal schemes: Memory Access Stepping Strategy (MASS) and Layer-Fused Memory Access Stepping Strategy (LFMASS), which achieve a speedup of 30.56% and 38.37%, respectively, compared to the native GPU-based McEliece6960119 implementation. Thirdly, extensive experiments on the NVIDIA RTX4090 show significant performance gains, achieving up to 344$\times$ higher encapsulation and 125$\times$ higher decapsulation compared to the official CPU-based AVX implementation, decisively outperforming existing ARM Cortex-M4 and FPGA implementations.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Post-quantum CryptographyClassic McElieceAdditive FFTGPU
Contact author(s)
2024040403 @ njupt edu cn
djiankuo @ njupt edu cn
1224045908 @ njupt edu cn
dongzhenjiang @ njupt edu cn
hduong @ uow edu au
xiaof @ njupt edu cn
linjq @ ustc edu cn
History
2025-05-22: revised
2025-04-27: received
See all versions
Short URL
https://ia.cr/2025/748
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2025/748,
      author = {Wen Wu and Jiankuo Dong and Zhen Xu and Zhenjiang Dong and Dung Duong and Fu Xiao and Jingqiang Lin},
      title = {Symphony of Speeds: Harmonizing Classic {McEliece} Cryptography with {GPU} Innovation},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/748},
      year = {2025},
      url = {https://eprint.iacr.org/2025/748}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.