Paper 2016/047

Comb to Pipeline: Fast Software Encryption Revisited

Andrey Bogdanov, Martin M. Lauridsen, and Elmar Tischhauser


AES-NI, or Advanced Encryption Standard New Instructions, is an extension of the x86 architecture proposed by Intel in 2008. With a pipelined implementation utilizing AES-NI, parallelizable modes such as AES-CTR become extremely efficient. However, out of the four non-trivial NIST-recommended encryption modes, three are inherently sequential: CBC, CFB, and OFB. This inhibits the advantage of using AES-NI significantly. Similar observations apply to CMAC, CCM and a great deal of other modes. We address this issue by proposing the comb scheduler -- a fast scheduling algorithm based on an efficient look-ahead strategy, featuring a low overhead -- with which sequential modes profit from the AES-NI pipeline in real-world settings by filling it with multiple, independent messages. As our main target platform we apply the comb scheduler to implementations on Haswell, a recent Intel microarchitecture, for a wide range of modes. We observe a drastic speed-up of factor 5 for NIST's CBC, CFB, OFB and CMAC performing around 0.88 cpb. Surprisingly, contrary to the entire body of previous performance analysis, the throughput of the authenticated encryption (AE) mode CCM gets very close to that of GCM and OCB3, with about 1.64 cpb (vs. 1.63 cpb and 1.51 cpb, respectively), when message lengths are sampled according to a realistic distribution for Internet packets, despite Haswell's heavily improved binary field multiplication. This suggests CCM as an AE mode of choice as it is NIST-recommended, does not have any weak-key issues like GCM, and is royalty-free as opposed to OCB3. Among the CAESAR contestants, the comb scheduler significantly speeds up CLOC/SILC, JAMBU, and POET, with the mostly sequential nonce-misuse resistant design of POET, performing at 2.14 cpb, becoming faster than the well-parallelizable COPA. Despite Haswell being the target platform, we also include performance figures for the more recent Skylake microarchitecture, which provides further optimizations to AES-NI instructions. Finally, this paper provides the first optimized AES-NI implementations for the novel AE modes OTR, CLOC/SILC, COBRA, POET, McOE-G, and Julius.

Available format(s)
Publication info
A minor revision of an IACR publication in FSE 2015
Contact author(s)
mmeh @ dtu dk
2016-01-19: received
Short URL
Creative Commons Attribution


      author = {Andrey Bogdanov and Martin M.  Lauridsen and Elmar Tischhauser},
      title = {Comb to Pipeline: Fast Software Encryption Revisited},
      howpublished = {Cryptology ePrint Archive, Paper 2016/047},
      year = {2016},
      doi = {10.1007/978-3-662-48116-5_8},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.