TMVP-based Polynomial Convolution for Saber and Sable on GPU using CUDA-cores and Tensor-cores

Muhammad Asfand Hafeez; Wai-Kong Lee; Angshuman Karmakar; Seong Oun Hwang

Paper 2023/1541

TMVP-based Polynomial Convolution for Saber and Sable on GPU using CUDA-cores and Tensor-cores

Muhammad Asfand Hafeez, Gachon University

Wai-Kong Lee, Gachon University

Angshuman Karmakar, Indian Institute of Technology Kanpur

Seong Oun Hwang, Gachon University

Abstract

Recently proposed lattice-based cryptography algorithms can be used to protect the IoT communication against the threat from quantum computers, but they are computationally heavy. In particular, polynomial multiplication is one of the most time-consuming operations in lattice-based cryptography. To achieve efficient implementation, the Number Theoretic Transform (NTT) algorithm is an ideal choice, but it has certain limitations on the parameters, which not all lattice-based schemes can employ directly. Hence, alternative techniques are proposed to accelerate polynomial multiplication on lattice-based schemes that cannot utilize the NTT directly. In this paper, we propose a parallel Toeplitz matrix-vector product (TMVP) version to accelerate the polynomial multiplication in PQC algorithms implemented it on a graphics processing unit (GPU). This is the first time a TMVP parallel version has been proposed and experimented on different GPU cores (i.e., CUDA-cores and Tensor-cores). The effectiveness of the proposed solution is validated on Saber (the NIST post-quantum standardization finalist) and Sable (an improved version of Saber) schemes. Experimental results show that TMVP-based polynomial convolution using CUDA-cores fails to exhibit a significant enhancement compared to the schoolbook CUDA-core method already proposed by Hafeez et al. 2023. However, when the TMVP technique is applied to Tensor-cores, it outperformed state-of-the-art implementations. The proposed Tensor-core approach outperformed the schoolbook Tensor-core method by up to 1.21×, and outperformed the dot-product-instructions method (Lee et al. 2022) by up to 3.63×. The proposed TMVP Tensor-cores is also faster than the TMVP CUDA-cores method by 13.76×

Metadata

Available format(s): PDF
Category: Implementation
Publication info: Preprint.
Keywords: oeplitz Matrix-vector Product (TMVP)Cryp- tography Tensor-cores CUDA-cores Post-quantum Cryptogra- phy Lattice-based Cryptography Matrix Multiplication
Contact author(s): muhammadasfandh @ gmail com
waikong lee @ gmail com
angshu99 @ gmail com
bardic @ naver com
History: 2023-10-09: approved; 2023-10-08: received; See all versions
Short URL: https://ia.cr/2023/1541
License: CC BY

BibTeX

@misc{cryptoeprint:2023/1541,
      author = {Muhammad Asfand Hafeez and Wai-Kong Lee and Angshuman Karmakar and Seong Oun Hwang},
      title = {{TMVP}-based Polynomial Convolution for Saber and Sable on {GPU} using {CUDA}-cores and Tensor-cores},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1541},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1541}
}