Paper 2024/1976
HI-CKKS: Is High-Throughput Neglected? Reimagining CKKS Efficiency with Parallelism
Abstract
The proliferation of data outsourcing and cloud services has heightened privacy vulnerabilities. CKKS, among the most prominent homomorphic encryption schemes, allows computations on encrypted data, serving as a critical privacy safeguard. However, performance remains a central bottleneck, hindering widespread adoption. Existing optimization efforts often prioritize latency reduction over throughput performance. This paper presents HI-CKKS, a throughput-oriented High-performance Implementation of CKKS homomorphic encryption, addressing these challenges. Our HI-CKKS introduces a batch-supporting asynchronous execution scheme, effectively mitigating frequent data interactions and high waiting delays between hosts and servers in service-oriented scenarios. We analyze the fundamental (I)NTT primitive, which is critical in CKKS, and develop a hierarchical, hybrid high-throughput implementation. This includes efficient arithmetic module instruction set implementations, unified kernel fusion, and hybrid memory optimization strategies that significantly improve memory access efficiency and the performance of (I)NTT operations. Additionally, we propose a multi-dimensional parallel homomorphic multiplication scheme aimed at maximizing throughput and enhancing the performance of (I)NTT and homomorphic multiplication. In conclusion, our implementation is deployed on the RTX 4090, where we conduct a thorough throughput performance evaluation of HI-CKKS, enabling us to pinpoint the most effective parallel parameter settings. Compared to the CPU implementation, our system achieves throughput increases of $175.08\times$, $191.27\times$, and $679.57\times$ for NTT, INTT, and HMult, respectively. And our throughput performance still demonstrates a significant improvement, ranging from $1.54\times$ to $693.17\times$ compared to the latest GPU-based works.
Metadata
- Available format(s)
- Category
- Implementation
- Publication info
- Preprint.
- Keywords
- CKKSHomomorphic MultiplicationNumber Theoretic Transform (NTT)Parallel ProcessingGPU
- Contact author(s)
-
2022040501 @ njupt edu cn
djiankuo @ foxmail com
1535575390 @ qq com - History
- 2024-12-12: approved
- 2024-12-06: received
- See all versions
- Short URL
- https://ia.cr/2024/1976
- License
-
CC BY-NC-SA
BibTeX
@misc{cryptoeprint:2024/1976, author = {Fuyuan Chen and Jiankuo Dong and Xiaoyu Hu and Zhenjiang Dong and Wangchen Dai and Jingqiang Lin and Fu Xiao}, title = {{HI}-{CKKS}: Is High-Throughput Neglected? Reimagining {CKKS} Efficiency with Parallelism}, howpublished = {Cryptology {ePrint} Archive, Paper 2024/1976}, year = {2024}, url = {https://eprint.iacr.org/2024/1976} }