Paper 2023/1637

Algorithmic Views of Vectorized Polynomial Multipliers – NTRU

Han-Ting Chen, National Taiwan University
Yi-Hua Chung, Academia Sinica
Vincent Hwang, Academia Sinica, Max Planck Institute for Security and Privacy
Bo-Yin Yang, Academia Sinica

The lattice-based post-quantum cryptosystem NTRU is used by Google for protecting Google’s internal communication. In NTRU, polynomial multiplication is one of bottleneck. In this paper, we explore the interactions between polynomial multiplication, Toeplitz matrix–vector product, and vectorization with architectural insights. For a unital commutative ring $R$, a positive integer $n$, and an element $\zeta \in R$, we reveal the benefit of vector-by-scalar multiplication instructions while multiplying in $R[x] / \langle x^n - \zeta \rangle$. We aim at designing an algorithm exploiting no algebraic and number–theoretic properties of $n$ and $\zeta$. An obvious way is to multiply in $R[x]$ and reduce modulo $x^n - \zeta$. Since the product in $R[x]$ is a polynomial of degree at most $2n − 2$, one usually chooses a polynomial modulus $g$ such that (i) $deg(g) \geq 2n − 1$, and (ii) there exists a well-studied fast polynomial multiplication algorithm f for multiplying in $R[x] / \langle g \rangle$. We deviate from common approaches and point out a novel insight with dual modules and vector-by-scalar multiplications. Conceptually, we relate the module-theoretic dual of $R[x] / \langle x^n - \zeta \rangle$ and $R[x] / \langle g \rangle$ with Toeplitz matrix-vector products, and demonstrate the benefit of Toeplitz matrix-vector products with vector-by-scalar multiplication instructions. It greatly reduces the register pressure, and allows us to multiply with essentially no permutation instructions that are commonly used in vectorized implementation. We implement the ideas for the NTRU parameter sets ntruhps2048677 and ntruhrss701 on a Cortex-A72 implementing the Armv8.0-A architecture with the single-instruction-multiple-data (SIMD) technology Neon. For polynomial multiplications, our implementation is 2.18× and 2.23× for ntruhps2048677 and ntruhrsss701 than the state-of-the-art optimized implementation. We also vectorize the polynomial inversions and sorting network by employing existing techniques and translating AVX2-optimized implementations into Neon. Compared to the state-of-the-art optimized implementation, our key generation, encapsulation, and decapsulation for ntruhps2048677 are 7.67×, 2.48×, and 1.77× faster, respectively. For ntruhrss701, our key generation, encapsulation, and decapsulation are 7.99×, 1.47×, and 1.56× faster, respectively.

Note: Full version.

Available format(s)
Publication info
Published elsewhere. Major revision. Indocrypt 2023
Toeplitz matrixNTRUVectorizationDual Module
Contact author(s)
r10922073 @ csie ntu edu tw
yhchiara @ gmail com
vincentvbh7 @ gmail com
by @ crypto tw
2024-01-30: last of 2 revisions
2023-10-21: received
See all versions
Short URL
No rights reserved


      author = {Han-Ting Chen and Yi-Hua Chung and Vincent Hwang and Bo-Yin Yang},
      title = {Algorithmic Views of Vectorized Polynomial Multipliers – NTRU},
      howpublished = {Cryptology ePrint Archive, Paper 2023/1637},
      year = {2023},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.