Paper 2023/541

Algorithmic Views of Vectorized Polynomial Multipliers for NTRU and NTRU Prime (Long Paper)

Han-Ting Chen, National Taiwan University
Yi-Hua Chung, Academia Sinica
Vincent Hwang, Academia Sinica, Max Planck Institute for Security and Privacy
Chi-Ting Liu, National Taiwan University, Academia Sinica
Bo-Yin Yang, Academia Sinica

This paper explores the design space of vector-optimized polynomial multiplications in the lattice-based key-encapsulation mechanisms NTRU and NTRU Prime. Since NTRU and NTRU Prime do not support straightforward applications of number– theoretic transforms, the state-of-the-art vector code either resorted to Toom–Cook, or introduced various techniques for coefficient ring extensions. All these techniques lead to a large number of small-degree polynomial multiplications, which is the bottleneck in our experiments. For NTRU Prime, we show how to reduce the number of small-degree polynomial multiplications to nearly 1/4 times compared to the previous vectorized code with the same functionality. Our transformations are based on careful choices of FFTs, including Good–Thomas, Rader’s, Schönhage’s, and Bruun’s FFTs. For NTRU, we show how to deploy Toom-5 with 3-bit losses. Furthermore, we show that the Toeplitz matrix–vector product naturally translates into efficient implementations with vector-by-scalar multiplication instructions which do not appear in all prior vector-optimized implementations. We choose the ARM Cortex-A72 CPU which implements the Armv8-A architecture for experiments, because of its wide uses in smartphones, and also the Neon vector instruction set implementing vector-by-scalar multiplications that do not appear in most other vector instruction sets like Intel’s AVX2. Even for platforms without vector-by-scalar multiplications, we expect significant improvements compared to the state of the art, since our transformations reduce the number of multiplication instructions by a large margin. Compared to the state-of-the-art optimized implementations, we achieve 2.18× and 6.7× faster polynomial multiplications for NTRU and NTRU Prime, respectively. For full schemes, we additionally vectorize the polynomial inversions, sorting network, and encoding/decoding subroutines in NTRU and NTRU Prime. For ntruhps2048677, we achieve 7.67×, 2.48×, and 1.77× faster key generation, encapsulation, and decapsulation, respectively. For ntrulpr761, we achieve 3×, 2.87×, and 3.25× faster key generation, encapsulation, and decapsulation, respectively. For sntrup761, there are no previously optimized implementations and we significantly outperform the reference implementation.

Note: See for implementations. We updated the introduction and discussion sections.

Available format(s)
Publication info
NTRUNTRU PrimeNeonFFTToeplitz matrixToom–Cook
Contact author(s)
r10922073 @ csie ntu edu tw
yhchiara @ gmail com
vincentvbh7 @ gmail com
gting906 @ gmail com
by @ crypto tw
2023-08-08: last of 2 revisions
2023-04-16: received
See all versions
Short URL
No rights reserved


      author = {Han-Ting Chen and Yi-Hua Chung and Vincent Hwang and Chi-Ting Liu and Bo-Yin Yang},
      title = {Algorithmic Views of Vectorized Polynomial Multipliers for NTRU and NTRU Prime (Long Paper)},
      howpublished = {Cryptology ePrint Archive, Paper 2023/541},
      year = {2023},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.