Cryptology ePrint Archive: Report 2016/047

Comb to Pipeline: Fast Software Encryption Revisited

Andrey Bogdanov and Martin M. Lauridsen and Elmar Tischhauser

Abstract: AES-NI, or Advanced Encryption Standard New Instructions, is an extension of the x86 architecture proposed by Intel in 2008. With a pipelined implementation utilizing AES-NI, parallelizable modes such as AES-CTR become extremely efficient. However, out of the four non-trivial NIST-recommended encryption modes, three are inherently sequential: CBC, CFB, and OFB. This inhibits the advantage of using AES-NI significantly. Similar observations apply to CMAC, CCM and a great deal of other modes. We address this issue by proposing the comb scheduler -- a fast scheduling algorithm based on an efficient look-ahead strategy, featuring a low overhead -- with which sequential modes profit from the AES-NI pipeline in real-world settings by filling it with multiple, independent messages.

As our main target platform we apply the comb scheduler to implementations on Haswell, a recent Intel microarchitecture, for a wide range of modes. We observe a drastic speed-up of factor 5 for NIST's CBC, CFB, OFB and CMAC performing around 0.88 cpb. Surprisingly, contrary to the entire body of previous performance analysis, the throughput of the authenticated encryption (AE) mode CCM gets very close to that of GCM and OCB3, with about 1.64 cpb (vs. 1.63 cpb and 1.51 cpb, respectively), when message lengths are sampled according to a realistic distribution for Internet packets, despite Haswell's heavily improved binary field multiplication. This suggests CCM as an AE mode of choice as it is NIST-recommended, does not have any weak-key issues like GCM, and is royalty-free as opposed to OCB3. Among the CAESAR contestants, the comb scheduler significantly speeds up CLOC/SILC, JAMBU, and POET, with the mostly sequential nonce-misuse resistant design of POET, performing at 2.14 cpb, becoming faster than the well-parallelizable COPA. Despite Haswell being the target platform, we also include performance figures for the more recent Skylake microarchitecture, which provides further optimizations to AES-NI instructions. Finally, this paper provides the first optimized AES-NI implementations for the novel AE modes OTR, CLOC/SILC, COBRA, POET, McOE-G, and Julius.

Category / Keywords: implementation / AES-NI, pclmulqdq, Haswell, Skylake, authenticated encryption, CAESAR, CBC, OFB, CFB, CMAC, CCM, GCM, OCB3, OTR, CLOC, COBRA, JAMBU, SILC, McOE-G, COPA, POET, Julius

Original Publication (with minor differences): IACR-FSE-2015
DOI:
10.1007/978-3-662-48116-5_8

Date: received 19 Jan 2016

Contact author: mmeh at dtu dk

Available format(s): PDF | BibTeX Citation

Version: 20160119:132339 (All versions of this report)

Short URL: ia.cr/2016/047

Discussion forum: Show discussion | Start new discussion


[ Cryptology ePrint archive ]