Paper 2023/1955

Barrett Multiplication for Dilithium on Embedded Devices

Vincent Hwang, Max Planck Institute for Security and Privacy
YoungBeom Kim, Kookmin University
Seog Chung Seo, Kookmin University
Abstract

We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett multiplications by relating modular reductions to integer approximations and demonstrated that Barrett multiplication is more favorable than Montgomery multiplication by absorbing the subtraction to the low multiplication. We first point out the benefit of Barrett multiplication when long and high multiplication instructions are unavailable, unusable, or slow. We then generalize the notion of integer approximations and improve the emulation of high multiplications used in Barrett multiplication. Compared to the state-of-the-art assembly-optimized implementations on Cortex-M3, our constant-time NTT/iNTT are 1.38−1.51 times faster and our variable-time NTT/iNTT are 1.10−1.21 times faster. On our 8-bit AVR, we outperform Montgomery-based C implementations of NTT/iNTT by 6.37−7.27 times by simply switching to the proposed Barrett-based implementation. We additionally implement Barrett-based NTT/iNTT in assembly and obtain 14.10− 14.42 times faster code. For the overall scheme, we provide speed-optimized implementations for Dilithium parameter sets dilithium2 and dilithium3 on Cortex-M3, and stack-optimized implementations for all parameter sets on Cortex-M3 and 8-bit AVR. We briefly compare the performance of speed-optimized dilithium3. Compared to the state-of-the-art assembly implementation on Cortex-M3, our assembly implementation reduces the key generation, signature generation, and signature verification cycles by 2.30%, 23.29%, and 0.69%. In the 8-bit AVR environment, our Barrett-based C implementation reduces the key generation, signature generation, and signature verification cycles by 45.09%, 56.80%, and 50.40%, respectively, and our assembly-optimized implementation reduces the cycles of each operation by 48.85%, 61.70%, and 55.08%, respectively.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
Modular multiplicationBarrett multiplicationLattice-based cryptographyDilithiumMicrocontrollerCortex-M38-bit AVR
Contact author(s)
vincentvbh7 @ gmail com
darania @ kookmin ac kr
scseo @ kookmin ac kr
History
2023-12-25: revised
2023-12-24: received
See all versions
Short URL
https://ia.cr/2023/1955
License
No rights reserved
CC0

BibTeX

@misc{cryptoeprint:2023/1955,
      author = {Vincent Hwang and YoungBeom Kim and Seog Chung Seo},
      title = {Barrett Multiplication for Dilithium on Embedded Devices},
      howpublished = {Cryptology ePrint Archive, Paper 2023/1955},
      year = {2023},
      note = {\url{https://eprint.iacr.org/2023/1955}},
      url = {https://eprint.iacr.org/2023/1955}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.