## Cryptology ePrint Archive: Report 2020/481

Using z14 Fused-Multiply-Add Instructions to Accelerate Elliptic Curve Cryptography

James You and Qi Zhang and Curtis D'Alves and Bill O'Farrell and Christopher K. Anand

Abstract: Due to growing commercial applications like Blockchain, the performance of large-integer arithmetic is the focus of both academic and industrial research. IBM introduced a new integer fused multiply-add instruction in z14, called VMSL, to accelerate such workloads. Unlike their floating-point counterparts, there are a variety of integer fused multiply-add instruction designs. VMSL multiplies two pairs of radix $2^{56}$ inputs, sums the two results together with an additional 128-bit input, and stores the resulting 128-bit value in a vector register. In this paper, we will describe the unique features of VMSL, the ways in which it is inherently more efficient than alternative specifications, in particular by enabling multiple carry strategies. We will then look at the issues we encountered implementing Montgomery Modular Multiplication for Elliptic Curve Cryptography on z14, including radix choice, mixed radices, instruction selection to trade instruction count for latency, and VMSL-specific optimizations for Montgomery-friendly moduli. The best choices resulted in a 20% increase in throughput.

Category / Keywords: implementation / elliptic curve cryptosystem, implementation, public-key cryptography, vector instructions, single instruction multiple data

Original Publication (in the same form): CASCON '19: 29th Annual International Conference on Computer Science and Software Engineering
DOI:
10.5555/3370272.3370302