PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs

Décio Luiz Gazzoni Filho; Guilherme Brandão; Gora Adj; Arwa Alblooshi; Isaac A. Canales-Martínez; Jorge Chávez-Saab; Julio López

Paper 2024/195

PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs

Décio Luiz Gazzoni Filho

, Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil, Department of Electrical Engineering, State University of Londrina, Londrina, Brazil

Guilherme Brandão

, Independent Researcher, Londrina, Brazil

Gora Adj

, Cryptography Research Centre, Technology Innovation Institute, Abu Dhabi, UAE

Arwa Alblooshi, Cryptography Research Centre, Technology Innovation Institute, Abu Dhabi, UAE

Isaac A. Canales-Martínez

, Cryptography Research Centre, Technology Innovation Institute, Abu Dhabi, UAE

Jorge Chávez-Saab

, Cryptography Research Centre, Technology Innovation Institute, Abu Dhabi, UAE

Julio López

, Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil

Abstract

As CPU performance is unable to keep up with the dramatic growth of the past few decades, CPU architects are looking into domain-specific architectures to accelerate certain tasks. A recent trend is the introduction of matrix-multiplication accelerators to CPUs by manufacturers such as IBM, Intel and ARM, some of which have not launched commercially yet. Apple's systems-on-chip (SoCs) for its mobile phones, tablets and personal computers include a proprietary, undocumented CPU-coupled matrix multiplication coprocessor called AMX. In this paper, we leverage AMX to accelerate the post-quantum lattice-based cryptosystems Saber and FrodoKEM, and benchmark their performance on Apple M1 and M3 SoCs. We propose a variant of the Toeplitz Matrix-Vector Product algorithm for polynomial multiplication, which sets new speed records for Saber using AMX (up to 13% for the main KEM operations, and 151% for matrix-vector multiplication of polynomials). For FrodoKEM, we set new speed records with our AMX implementation (up to 21% for the main KEM operations, and 124% for matrix multiplication, with even greater improvements for -batching). Such speedups are relative to our optimized NEON implementation, also presented here, which improves upon the state-of-the-art implementation for ARMv8 CPUs.

Metadata

Available format(s): PDF
Category: Implementation
Publication info: Preprint.
Keywords: Post-quantum cryptography AMX ARM NEON FrodoKEM Saber
Contact author(s): decio gazzoni @ ic unicamp br
brandaogbs @ gmail com
gora adj @ tii ae
arwa alblooshi @ tii ae
isaac canales @ tii ae
jorge saab @ tii ae
jlopez @ ic unicamp br
History: 2024-02-09: approved; 2024-02-09: received; See all versions
Short URL: https://ia.cr/2024/195
License: CC BY

BibTeX

@misc{cryptoeprint:2024/195,
      author = {Décio Luiz Gazzoni Filho and Guilherme Brandão and Gora Adj and Arwa Alblooshi and Isaac A. Canales-Martínez and Jorge Chávez-Saab and Julio López},
      title = {{PQC}-{AMX}: Accelerating Saber and {FrodoKEM} on the Apple M1 and M3 {SoCs}},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/195},
      year = {2024},
      url = {https://eprint.iacr.org/2024/195}
}