Cryptology ePrint Archive: Report 2007/228
AN OPTIMIZED HARDWARE ARCHITECTURE OF MONTGOMERY MULTIPLICATION ALGORITHM
Miaoqing Huang and Kris Gaj and Soonhak Kwon and Tarek El-Ghazawi
Abstract: Montgomery multiplication is one of the fundamental operations used
in cryptographic algorithms, such as RSA and Elliptic Curve
Cryptosystems. At CHES 1999, Tenca and Koc introduced a
now-classical architecture for implementing Montgomery
multiplication in hardware. With parameters optimized for minimum
latency, this architecture performs a single Montgomery
multiplication in approximately 2n clock cycles, where n is the
size of operands in bits. In this paper we propose and discuss an
optimized hardware architecture performing the same operation in
approximately n clock cycles. Our architecture is based on
pre-computing partial results using two possible assumptions
regarding the most significant bit of the previous word, and is only
marginally more demanding in terms of the circuit area. The new
radix-2 architecture can be extended for the case of radix-4, while
preserving a factor of two speed-up over the corresponding radix-4
design by Tenca, Todorov, and Koc from CHES 2001. Our
architecture has been verified by modeling it in Verilog-HDL,
implementing it using Xilinx Virtex-II 6000 FPGA, and experimentally
testing it using SRC-6 reconfigurable computer.
Category / Keywords: public-key cryptography / Montgomery multiplication,MWR2MM Algorithm,Field Programmable Gate Arrays
Publication Info: Not published
Date: received 12 Jun 2007
Contact author: mqhuang at gwu edu
Available format(s): Postscript (PS) | Compressed Postscript (PS.GZ) | PDF | BibTeX Citation
Version: 20070619:194631 (All versions of this report)
Short URL: ia.cr/2007/228
Discussion forum: Show discussion | Start new discussion
[ Cryptology ePrint archive ]