Paper 2024/1754

PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor

Zewen Ye, Zhejiang University, City University of Hong Kong
Junhao Huang, BNU-HKBU United International College
Tianshun Huang, Zhejiang University
Yudan Bai, Zhejiang University
Jinze Li, Zhejiang University
Hao Zhang, Zhejiang University
Guangyan Li, City University of Hong Kong
Donglong Chen, BNU-HKBU United International College
Ray C.C. Cheung, City University of Hong Kong
Kejie Huang, Zhejiang University
Abstract

Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained Internet-of-Things (IoT) devices. Despite the popularity of Crystal-Dilithium and Crystal-Kyber, research on NTRU-based schemes has been limited due to their complex algorithms and operations. Falcon and Hawk's performance remains constrained by the lack of parallel execution in crucial operations like the Number Theoretic Transform (NTT) and Fast Fourier Transform (FFT), with data dependency being a significant bottleneck. This paper enhances NTRU-based schemes Falcon and Hawk through hardware/software co-design on a customized Single-Instruction-Multiple-Data (SIMD) processor, proposing new SIMD hardware units and instructions to expedite these schemes along with software optimizations to boost performance. Our NTT optimization includes a novel layer merging technique for SIMD architecture to reduce memory accesses, and the use of modular algorithms (Signed Montgomery and Improved Plantard) targets various modulus data widths to enhance performance. We explore applying layer merging to accelerate fixed-point FFT at the SIMD instruction level and devise a dual-issue parser to streamline assembly code organization to maximize dual-issue utilization. A System-on-chip (SoC) architecture is devised to improve the practical application of the processor in real-world scenarios. Evaluation on 28 nm technology and FPGA platform shows that our design and optimizations can increase the performance of Hawk signature generation and verification by over 7 times.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Post-quantum CryptographyNTRUFalconHawkRISC-VSystem on Chip
Contact author(s)
lucas zw ye @ zju edu cn
huangjunhao @ uic edu cn
z1458152445 @ 163 com
byd baiyudan @ zju edu cn
lijinze2233 @ zju edu cn
floyd haozhang @ zju edu cn
guangyali5-c @ my cityu edu hk
donglongchen @ uic edu cn
r cheung @ cityu edu hk
huangkejie @ zju edu cn
History
2024-10-30: approved
2024-10-28: received
See all versions
Short URL
https://ia.cr/2024/1754
License
Creative Commons Attribution-NonCommercial
CC BY-NC

BibTeX

@misc{cryptoeprint:2024/1754,
      author = {Zewen Ye and Junhao Huang and Tianshun Huang and Yudan Bai and Jinze Li and Hao Zhang and Guangyan Li and Donglong Chen and Ray C.C. Cheung and Kejie Huang},
      title = {{PQNTRU}: Acceleration of {NTRU}-based Schemes via Customized Post-Quantum Processor},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1754},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1754}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.