## Cryptology ePrint Archive: Report 2020/1302

TMVP-based Multiplication for Polynomial Quotient Rings and Application to Saber on ARM Cortex-M4

İrem Keskinkurt Paksoy and Murat Cenk

Abstract: Lattice-based NIST PQC finalists need efficient multiplication in $\mathbb{Z}_q[x]/(f(x))$. Multiplication in this ring can be performed very efficiently via number theoretic transform (NTT) as done in CRYSTALS-Kyber if the parameters of the scheme allow it. If NTT is not supported, other multiplication algorithms must be employed. For example, if the modulus $q$ of the scheme is a power of two as in Saber and NTRU, then NTT can not be used directly. In this case, Karatsuba and Toom-Cook methods together with modular reduction are commonly used for multiplication in this ring. In this paper, we show that the Toeplitz matrix-vector product (TMVP) representation of modular polynomial multiplication yields better results than Karatsuba and Toom-Cook methods. We present three- and four-way TMVP formulas that we derive from three- and four-way Toom-Cook algorithms, respectively. We use the four-way TMVP formula to develop an algorithm for multiplication in the ring $\mathbb{Z}_{2^m}[x]/(x^{256}+1)$. We implement the proposed algorithm on the ARM Cortex-M4 microcontroller and apply it to Saber, which is one of the lattice-based finalists of the NIST PQC competition. We compare the results to previous implementations. The TMVP-based multiplication algorithm we propose is $20.83\%$ faster than the previous algorithm that uses a combination of Toom-Cook, Karatsuba, and schoolbook methods. Our algorithm also speeds up key generation, encapsulation, and decapsulation algorithms of all variants of Saber. The speedups vary between $4.3-39.8\%$. Moreover, our algorithm requires less memory than the others, except for the memory-optimized implementation of Saber.

Category / Keywords: implementation / Lattice-based cryptography, Post-quantum cryptography, ARM Cortex-M4, MLWR, Saber, Toeplitz, TMVP