Comb to Pipeline: Fast Software Encryption Revisited

You are looking at a specific version 20160119:132339 of this paper. See the latest version.

Paper 2016/047

Comb to Pipeline: Fast Software Encryption Revisited

Andrey Bogdanov and Martin M. Lauridsen and Elmar Tischhauser

Abstract

AES-NI, or Advanced Encryption Standard New Instructions, is an extension of the x86 architecture proposed by Intel in 2008. With a pipelined implementation utilizing AES-NI, parallelizable modes such as AES-CTR become extremely efficient. However, out of the four non-trivial NIST-recommended encryption modes, three are inherently sequential: CBC, CFB, and OFB. This inhibits the advantage of using AES-NI significantly. Similar observations apply to CMAC, CCM and a great deal of other modes. We address this issue by proposing the comb scheduler -- a fast scheduling algorithm based on an efficient look-ahead strategy, featuring a low overhead -- with which sequential modes profit from the AES-NI pipeline in real-world settings by filling it with multiple, independent messages. As our main target platform we apply the comb scheduler to implementations on Haswell, a recent Intel microarchitecture, for a wide range of modes. We observe a drastic speed-up of factor 5 for NIST's CBC, CFB, OFB and CMAC performing around 0.88 cpb. Surprisingly, contrary to the entire body of previous performance analysis, the throughput of the authenticated encryption (AE) mode CCM gets very close to that of GCM and OCB3, with about 1.64 cpb (vs. 1.63 cpb and 1.51 cpb, respectively), when message lengths are sampled according to a realistic distribution for Internet packets, despite Haswell's heavily improved binary field multiplication. This suggests CCM as an AE mode of choice as it is NIST-recommended, does not have any weak-key issues like GCM, and is royalty-free as opposed to OCB3. Among the CAESAR contestants, the comb scheduler significantly speeds up CLOC/SILC, JAMBU, and POET, with the mostly sequential nonce-misuse resistant design of POET, performing at 2.14 cpb, becoming faster than the well-parallelizable COPA. Despite Haswell being the target platform, we also include performance figures for the more recent Skylake microarchitecture, which provides further optimizations to AES-NI instructions. Finally, this paper provides the first optimized AES-NI implementations for the novel AE modes OTR, CLOC/SILC, COBRA, POET, McOE-G, and Julius.

Metadata

Available format(s): PDF
Category: Implementation
Publication info: A minor revision of an IACR publication in FSE 2015
DOI: 10.1007/978-3-662-48116-5_8
Keywords: AES-NI pclmulqdq Haswell Skylake authenticated encryption CAESAR CBC OFB CFB CMAC CCM GCM OCB3 OTR CLOC COBRA JAMBU SILC McOE-G COPA POET Julius
Contact author(s): mmeh @ dtu dk
History: 2016-01-19: received
Short URL: https://ia.cr/2016/047
License: CC BY