Paper 2012/275

Implementing BLAKE with AVX, AVX2, and XOP

Samuel Neves and Jean-Philippe Aumasson

Abstract

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit single-instruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-any permute and vectorized table-lookup. In this paper, we explore the potential of AVX2 to speed-up the SHA-3 finalist BLAKE, and present the first working assembly implementations of BLAKE-256 and BLAKE-512 with AVX2. We then investigate the potential of the recent AVX and XOP instructions to accelerate BLAKE, and report new speed records on Sandy Bridge and Bulldozer microarchitectures (7.47 and 11.64 cycles per byte for BLAKE-256, 5.71 and 6.95 for BLAKE-512).

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Published elsewhere. Extended version of the Third SHA-3 Conference paper
Keywords
hash functionssimdimplementation
Contact author(s)
jeanphilippe aumasson @ gmail com
History
2012-05-29: received
Short URL
https://ia.cr/2012/275
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2012/275,
      author = {Samuel Neves and Jean-Philippe Aumasson},
      title = {Implementing BLAKE with AVX, AVX2, and XOP},
      howpublished = {Cryptology ePrint Archive, Paper 2012/275},
      year = {2012},
      note = {\url{https://eprint.iacr.org/2012/275}},
      url = {https://eprint.iacr.org/2012/275}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.