Paper 2024/136

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Zhejiang University
Xinpeng Yang, Zhejiang University
Lipeng He, University of Waterloo
Kejia Chen, Zhejiang University
Wen-jie Lu, Zhejiang University
Yinghao Wang, Zhejiang University
Xiaoyang Hou, Zhejiang University
Jian Liu, Zhejiang University
Kui Ren, Zhejiang University
Xiaohu Yang, Zhejiang University
Abstract

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference process: submitting an encrypted input and receiving an encrypted result. NEXUS introduces several novel primitives, including SIMD ciphertext compression/decompression, SIMD slot folding, and secure Argmax, which enable it to significantly surpass the state-of-the-art in communication while maintaining comparable runtime. Specifically, it reduces bandwidth consumption by 372.5$\times$ compared to BOLT (Oakland~'24) and 53.6$\times$ compared to Bumblebee (NDSS~'25). Furthermore, its non-interactive property allows for optimal hardware acceleration, with the GPU version achieving a 42.3$\times$ speedup in runtime. This enables NEXUS to run inference on a BERT-based model in just 37.3 seconds, consuming only 164~MB of bandwidth.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Published elsewhere. Major revision. Network and Distributed System Security (NDSS) Symposium
Keywords
Secure InferenceLLMHomomorphic Encryption
Contact author(s)
kevinzh @ zju edu cn
yangxinpeng @ zju edu cn
lipeng he @ uwaterloo ca
chenkejia @ zju edu cn
fionser @ gmail com
asternight @ zju edu cn
xiaoyanghou @ zju edu cn
jian liu @ zju edu cn
kuiren @ zju edu cn
yangxh @ zju edu cn
History
2024-09-16: last of 3 revisions
2024-01-31: received
See all versions
Short URL
https://ia.cr/2024/136
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2024/136,
      author = {Jiawen Zhang and Xinpeng Yang and Lipeng He and Kejia Chen and Wen-jie Lu and Yinghao Wang and Xiaoyang Hou and Jian Liu and Kui Ren and Xiaohu Yang},
      title = {Secure Transformer Inference Made Non-interactive},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/136},
      year = {2024},
      url = {https://eprint.iacr.org/2024/136}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.