At a lower level, this paper shows how to multiply two elements of a field of size 2^128 using just 9062 \approx 71 * 128 bit operations, and how to multiply two elements of a field of size 2^256 using just 22164 \approx 87 * 256 bit operations. This performance relies on a new representation of field elements and new FFT-based multiplication techniques.
This paper's constant-time software uses just 1.89 Core 2 cycles per byte to authenticate very long messages. On a Sandy Bridge it takes 1.43 cycles per byte, without using Intel's PCLMULQDQ polynomial-multiplication hardware. This is much faster than the speed records for constant-time implementations of GHASH without PCLMULQDQ (over 10 cycles/byte), even faster than Intel's best Sandy Bridge implementation of GHASH with PCLMULQDQ (1.79 cycles/byte), and almost as fast as state-of-the-art 128-bit prime-field MACs using Intel's integer-multiplication hardware (around 1 cycle/byte).
Category / Keywords: implementation / Performance, FFTs, Polynomial multiplication, Universal hashing, Message authentication Date: received 18 Sep 2014 Contact author: authorcontact-auth256 at box cr yp to Available format(s): PDF | BibTeX Citation Note: expanded version of sac 2014 paper Version: 20140919:211832 (All versions of this report) Short URL: ia.cr/2014/729 Discussion forum: Show discussion | Start new discussion