Paper 2024/136
Secure Transformer Inference Made Non-interactive
Abstract
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference process: submitting an encrypted input and receiving an encrypted result. NEXUS introduces several novel primitives, including SIMD ciphertext compression/decompression, SIMD slot folding, and secure Argmax, which enable it to significantly surpass the state-of-the-art in communication while maintaining comparable runtime. Specifically, it reduces bandwidth consumption by 372.5$\times$ compared to BOLT (Oakland~'24) and 53.6$\times$ compared to Bumblebee (NDSS~'25). Furthermore, its non-interactive property allows for optimal hardware acceleration, with the GPU version achieving a 42.3$\times$ speedup in runtime. This enables NEXUS to run inference on a BERT-based model in just 37.3 seconds, consuming only 164~MB of bandwidth.
Metadata
- Available format(s)
- Category
- Cryptographic protocols
- Publication info
- Published elsewhere. Major revision. Network and Distributed System Security (NDSS) Symposium
- Keywords
- Secure InferenceLLMHomomorphic Encryption
- Contact author(s)
-
kevinzh @ zju edu cn
yangxinpeng @ zju edu cn
lipeng he @ uwaterloo ca
chenkejia @ zju edu cn
fionser @ gmail com
asternight @ zju edu cn
xiaoyanghou @ zju edu cn
jian liu @ zju edu cn
kuiren @ zju edu cn
yangxh @ zju edu cn - History
- 2024-09-16: last of 3 revisions
- 2024-01-31: received
- See all versions
- Short URL
- https://ia.cr/2024/136
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2024/136, author = {Jiawen Zhang and Xinpeng Yang and Lipeng He and Kejia Chen and Wen-jie Lu and Yinghao Wang and Xiaoyang Hou and Jian Liu and Kui Ren and Xiaohu Yang}, title = {Secure Transformer Inference Made Non-interactive}, howpublished = {Cryptology {ePrint} Archive, Paper 2024/136}, year = {2024}, url = {https://eprint.iacr.org/2024/136} }