Paper 2022/1621

cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption

Hao Yang, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Shiyu Shen, School of Computer Science, Fudan University, Shanghai, China
Zhe Liu, Research Institute of Basic Theories, Zhejiang lab, Hangzhou, China, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Yunlei Zhao, School of Computer Science, Fudan University, Shanghai, China, State Key Laboratory of Cryptology, Beijing
Abstract

Private comparison schemes constructed on homomorphic encryption offer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and float inputs, offers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for CUDA-enabled GPUs. Firstly, we fully utilize the shared memory and present compact GPU implementations of NTT and INTT using a single block; Secondly, we fuse multiple kernels into one AKS kernel, which conducts the automorphism and key switching operation, and reduce the grid dimension for better resource usage, data access rate and synchronization. Thirdly, we precisely measure the IO latency and choose an appropriate number of CUDA streams to enable concurrent execution of independent operations, yielding a constant term extraction kernel with perfect latency hide, i.e., CTX. Combining these approaches, we boost the overall execution time to optimum level and the speedup ratio increases with the comparison scales. For one comparison, we speedup the AKS by 23.71×, CTX by 15.58×, and scheme by 1.83× (resp., 18.29×, 11.75×, and 1.42×) compared to C (resp., AVX512) baselines, respectively. For 32 comparisons, our CTX and scheme implementations outperform the C (resp., AVX512) baselines by 112.00× and 1.99× (resp., 81.53× and 1.51×).

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint.
Keywords
Private comparisonHomomorphic encryptionGPU optimizationNumber theoretic transformKey switching
Contact author(s)
crypto @ d4rk dev
shenshiyu21 @ m fudan edu cn
zhe liu @ nuaa edu cn
ylzhao @ fudan edu cn
History
2022-11-21: approved
2022-11-21: received
See all versions
Short URL
https://ia.cr/2022/1621
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2022/1621,
      author = {Hao Yang and Shiyu Shen and Zhe Liu and Yunlei Zhao},
      title = {{cuXCMP}: {CUDA}-Accelerated Private Comparison Based on Homomorphic Encryption},
      howpublished = {Cryptology {ePrint} Archive, Paper 2022/1621},
      year = {2022},
      url = {https://eprint.iacr.org/2022/1621}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.