We present a new modular reduction technique that is tailored for the special moduli required by the NTT. Based on this reduction, we speed up the NTT and propose faster, multi-purpose algorithms. We present two implementations of these algorithms: a portable C implementation and a high-speed implementation using assembly with AVX2 instructions. To demonstrate the improved efficiency in an application example, we benchmarked the algorithms in the context of the R-LWE key exchange protocol that has recently been proposed by Alkim, Ducas, Pöppelmann and Schwabe. In this case, our C and assembly implementations compute the full key exchange 1.49 and 1.13 times faster, respectively. These results are achieved with full protection against timing attacks.
Category / Keywords: implementation / Post-quantum cryptography, number theoretic transform (NTT), ring learning with errors (R-LWE), fast modular reduction, efficient implementation. Original Publication (with minor differences): CANS 2016 Date: received 23 May 2016, last revised 9 Nov 2016 Contact author: plonga at microsoft com Available format(s): PDF | BibTeX Citation Version: 20161110:001118 (All versions of this report) Short URL: ia.cr/2016/504 Discussion forum: Show discussion | Start new discussion