Cryptology ePrint Archive: Report 2016/323

A Family of Scalable Polynomial Multiplier Architectures for Ring-LWE Based Cryptosystems

Chaohui Du and Guoqiang Bai

Abstract: Many lattice based cryptosystems are based on the Ring learning with errors (Ring-LWE) problem. The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication over rings. In this paper, we exploit the number theoretic transform (NTT) to build a family of scalable polynomial multiplier architectures, which provide designers with a trade-off choice of speed vs. area. Our polynomial multipliers are capable to calculate the product of two $n$-degree polynomials in about $(1.5n\lg n + 1.5n)/b$ clock cycles, where $b$ is the number of the butterfly operators. In addition, we exploit the cancellation lemma to reduce the required ROM storage. The experimental results on a Spartan-6 FPGA show that the proposed polynomial multiplier architectures achieve a speedup of 3 times on average and consume less Block RAMs and slices when compared with the compact design. Compared with the state of the art of high-speed design, the proposed hardware architectures save up to 46.64\% clock cycles and improve the utilization rate of the main data processing units by 42.27\%. Meanwhile, our designs can save up to 29.41\% block RAMs.

Category / Keywords: implementation /

Original Publication (with major differences): 2015 IEEE Trustcom/BigDataSE/ISPA

Date: received 22 Mar 2016

Contact author: duchhzhh at 163 com

Available format(s): PDF | BibTeX Citation

Note: The work in this paper is based on our previous work published at Trustcom-2015 which presents a preliminary version of the proposed polynomial multiplier architectures. The contribution of this work extends the previous work by providing more details and further optimizations. In this paper, we further optimize the polynomial multiplication scheme in algorithm level. Then we introduce several optimizations to reduce the hardware resources. Based on these techniques, we propose faster and tighter hardware architectures. The designs in this paper outperform our previous work by a factor of 1.21 on average. Meanwhile, the designs consume less hardware resources. We also provide implementation results using the parameter sets proposed in \cite{ChenTOCS2014}. They show that the proposed hardware architectures outperform the state of the art of high-speed design in the literature.

Version: 20160323:080524 (All versions of this report)

Short URL:

[ Cryptology ePrint archive ]