Paper 2024/136
Secure Transformer Inference Made Non-interactive
Abstract
Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference, using which the client is only required to perform one round of communication with the server throughout the evaluation process: submit an encrypted input and await the encrypted result from the server. Our contributions are three-fold: First, we propose an amortized-friendly matrix multiplication algorithm, which achieves a 1.6-3.3$\times$ speedup and saves 60\% communication overhead compared to SOTA techniques. Secondly, we present a novel Argmax algorithm that reduces the computational complexity from $O(m)$ in Phoenix (CCS'22) to $O(\log m)$, achieving a 55.6$\times$ speedup ($m$ is the number of labels, $m=30522$ in BERT-base). Lastly, we provide an end-to-end implementation and evaluation of results. NEXUS outperforms BOLT (Oakland'24) by over an order of magnitude and is 1.8$\times$ faster, 2.4$\times$ cheaper than Bumblebee (NDSS'25). We also provide a GPU-accelerated version of our work, improving the inference speed by 42.3$\times$ and reducing financial cost by 17.2$\times$ to a per-token price of only \$0.05.
Metadata
- Available format(s)
- Category
- Cryptographic protocols
- Publication info
- Published elsewhere. Major revision. Network and Distributed System Security (NDSS) Symposium
- Keywords
- Secure InferenceLLMHomomorphic Encryption
- Contact author(s)
-
kevinzh @ zju edu cn
jian liu @ zju edu cn
lipeng he @ uwaterloo ca
yangxinpeng @ zju edu cn
fionser @ gmail com
asternight @ zju edu cn
chenkejia @ zju edu cn
xiaoyanghou @ zju edu cn
kuiren @ zju edu cn
yangxh @ zju edu cn - History
- 2024-09-02: last of 2 revisions
- 2024-01-31: received
- See all versions
- Short URL
- https://ia.cr/2024/136
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2024/136, author = {Jiawen Zhang and Jian Liu and Lipeng He and Xinpeng Yang and Wen-jie Lu and Yinghao Wang and Kejia Chen and Xiaoyang Hou and Kui Ren and Xiaohu Yang}, title = {Secure Transformer Inference Made Non-interactive}, howpublished = {Cryptology {ePrint} Archive, Paper 2024/136}, year = {2024}, url = {https://eprint.iacr.org/2024/136} }