A Family of Scalable Polynomial Multiplier Architectures for Ring-LWE Based Cryptosystems

Chaohui Du; Guoqiang Bai

Paper 2016/323

A Family of Scalable Polynomial Multiplier Architectures for Ring-LWE Based Cryptosystems

Chaohui Du and Guoqiang Bai

Abstract

Many lattice based cryptosystems are based on the Ring learning with errors (Ring-LWE) problem. The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication over rings. In this paper, we exploit the number theoretic transform (NTT) to build a family of scalable polynomial multiplier architectures, which provide designers with a trade-off choice of speed vs. area. Our polynomial multipliers are capable to calculate the product of two $n$ -degree polynomials in about $(1.5 n \lg n + 1.5 n) / b$ clock cycles, where is the number of the butterfly operators. In addition, we exploit the cancellation lemma to reduce the required ROM storage. The experimental results on a Spartan-6 FPGA show that the proposed polynomial multiplier architectures achieve a speedup of 3 times on average and consume less Block RAMs and slices when compared with the compact design. Compared with the state of the art of high-speed design, the proposed hardware architectures save up to 46.64\% clock cycles and improve the utilization rate of the main data processing units by 42.27\%. Meanwhile, our designs can save up to 29.41\% block RAMs.

Note: The work in this paper is based on our previous work published at Trustcom-2015 which presents a preliminary version of the proposed polynomial multiplier architectures. The contribution of this work extends the previous work by providing more details and further optimizations. In this paper, we further optimize the polynomial multiplication scheme in algorithm level. Then we introduce several optimizations to reduce the hardware resources. Based on these techniques, we propose faster and tighter hardware architectures. The designs in this paper outperform our previous work by a factor of 1.21 on average. Meanwhile, the designs consume less hardware resources. We also provide implementation results using the parameter sets proposed in \cite{ChenTOCS2014}. They show that the proposed hardware architectures outperform the state of the art of high-speed design in the literature.

Metadata

Available format(s): PDF
Category: Implementation
Publication info: Published elsewhere. Major revision. 2015 IEEE Trustcom/BigDataSE/ISPA
Contact author(s): duchhzhh @ 163 com
History: 2016-03-23: received
Short URL: https://ia.cr/2016/323
License: CC BY

BibTeX

@misc{cryptoeprint:2016/323,
      author = {Chaohui Du and Guoqiang Bai},
      title = {A Family of Scalable Polynomial Multiplier Architectures for Ring-{LWE} Based Cryptosystems},
      howpublished = {Cryptology {ePrint} Archive, Paper 2016/323},
      year = {2016},
      url = {https://eprint.iacr.org/2016/323}
}