Paper 2011/239

Efficient Software Implementations of Modular Exponentiation

Shay Gueron


RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d).

Note: Fixing some problems with referencing figures/algorithms in the document.

Available format(s)
Publication info
Published elsewhere. Unknown where it was published
modular arithmeticmodular exponentiationMontgomery multiplicationRSA.
Contact author(s)
shay @ math haifa ac il
2011-06-28: revised
2011-05-18: received
See all versions
Short URL
Creative Commons Attribution


      author = {Shay Gueron},
      title = {Efficient Software Implementations of Modular Exponentiation},
      howpublished = {Cryptology ePrint Archive, Paper 2011/239},
      year = {2011},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.