Cryptology ePrint Archive: Report 2014/760

Montgomery Modular Multiplication on ARM-NEON Revisited

Hwajeong Seo, Zhe Liu, Johann Großschädl, Jongseok Choi, and Howon Kim

Abstract: Montgomery modular multiplication constitutes the "arithmetic foundation" of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implementations of Montgomery modular multiplication. In this paper, we introduce the Cascade Operand Scanning (COS) method to speed up multi-precision multiplication on SIMD architectures. We developed the COS technique with the goal of reducing Read-After-Write (RAW) dependencies in the propagation of carries, which also reduces the number of pipeline stalls (i.e. bubbles). The COS method operates on 32-bit words in a row-wise fashion (similar to the operand-scanning method) and does not require a "non-canonical" representation of operands with a reduced radix. We show that two COS computations can be "coarsely" integrated into an efficient vectorized variant of Montgomery multiplication, which we call Coarsely Integrated Cascade Operand Scanning (CICOS) method. Due to our sophisticated instruction scheduling, the CICOS method reaches record-setting execution times for Montgomery modular multiplication on ARM-NEON platforms. Detailed benchmarking results obtained on an ARM Cortex-A9 and Cortex-A15 processors show that the proposed CICOS method outperforms Bos et al's implementation from SAC 2013 by up to 57% (A9) and 40% (A15), respectively. Furthermore, our COS multiplication is faster than lastest GMP 6.0.0 by up to 55% (A9) and 52% (A15), respectively.

Category / Keywords: implementation / Public-key cryptography, modular arithmetic, SIMD-level parallelism, vector instructions, ARM NEON

Original Publication (with minor differences): ICISC2014

Date: received 29 Sep 2014, last revised 30 Oct 2014

Contact author: hwajeong84 at gmail com

Available format(s): PDF | BibTeX Citation

Version: 20141031:014326 (All versions of this report)

Short URL:

[ Cryptology ePrint archive ]