Paper 2024/136

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Zhejiang University
Jian Liu, Zhejiang University
Lipeng He, University of Waterloo
Xinpeng Yang, Zhejiang University
Wen-jie Lu, Zhejiang University
Yinghao Wang, Zhejiang University
Kejia Chen, Zhejiang University
Xiaoyang Hou, Zhejiang University
Kui Ren, Zhejiang University
Xiaohu Yang, Zhejiang University
Abstract

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference, using which the client is only required to perform one round of communication with the server throughout the evaluation process: submit an encrypted input and await the encrypted result from the server. Our contributions are three-fold: First, we propose an amortized-friendly matrix multiplication algorithm, which achieves a 1.6-3.3$\times$ speedup and saves 60\% communication overhead compared to SOTA techniques. Secondly, we present a novel Argmax algorithm that reduces the computational complexity from $O(m)$ in Phoenix (CCS'22) to $O(\log m)$, achieving a 55.6$\times$ speedup ($m$ is the number of labels, $m=30522$ in BERT-base). Lastly, we provide an end-to-end implementation and evaluation of results. NEXUS outperforms BOLT (Oakland'24) by over an order of magnitude and is 1.8$\times$ faster, 2.4$\times$ cheaper than Bumblebee (NDSS'25). We also provide a GPU-accelerated version of our work, improving the inference speed by 42.3$\times$ and reducing financial cost by 17.2$\times$ to a per-token price of only \$0.05.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Published elsewhere. Major revision. Network and Distributed System Security (NDSS) Symposium
Keywords
Secure InferenceLLMHomomorphic Encryption
Contact author(s)
kevinzh @ zju edu cn
jian liu @ zju edu cn
lipeng he @ uwaterloo ca
yangxinpeng @ zju edu cn
fionser @ gmail com
asternight @ zju edu cn
chenkejia @ zju edu cn
xiaoyanghou @ zju edu cn
kuiren @ zju edu cn
yangxh @ zju edu cn
History
2024-09-02: last of 2 revisions
2024-01-31: received
See all versions
Short URL
https://ia.cr/2024/136
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2024/136,
      author = {Jiawen Zhang and Jian Liu and Lipeng He and Xinpeng Yang and Wen-jie Lu and Yinghao Wang and Kejia Chen and Xiaoyang Hou and Kui Ren and Xiaohu Yang},
      title = {Secure Transformer Inference Made Non-interactive},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/136},
      year = {2024},
      url = {https://eprint.iacr.org/2024/136}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.