Paper 2024/1881
THOR: Secure Transformer Inference with Homomorphic Encryption
Abstract
As language models are increasingly deployed in cloud environments, privacy concerns have become a significant issue. To address this, we design THOR, a secure inference framework for transformer models on encrypted data. Specifically, we first propose new fast matrix multiplication algorithms based on diagonal-major order encoding and extend them to parallel matrix computation through the compact ciphertext packing technique. Second, we design efficient protocols for secure computations of four non-linear functions such as softmax, LayerNorm, GELU, and Tanh, by integrating advanced underlying approximation methods with tailored optimizations. Our matrix multiplication algorithms reduce the number of key-switching operations in the linear layers of the attention block in the BERT-base model by up to 14.5x, compared to the state-of-the-art HE-based secure inference protocol (Park et al., Preprint). Combined with cryptographic optimizations, our experimental results demonstrate that THOR provides secure inference for the BERT-base model with a latency of 10.43 minutes on a single GPU, while maintaining comparable inference accuracy on the MRPC dataset.
Metadata
- Available format(s)
- Category
- Cryptographic protocols
- Publication info
- Preprint.
- Keywords
- Homomorphic encryptiontransformer
- Contact author(s)
-
moonjungho @ hanyang ac kr
aydw0507 @ yonsei ac kr
Xiaoqian Jiang @ uth tmc edu
miran @ hanyang ac kr - History
- 2024-11-22: approved
- 2024-11-19: received
- See all versions
- Short URL
- https://ia.cr/2024/1881
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2024/1881, author = {Jungho Moon and Dongwoo Yoo and Xiaoqian Jiang and Miran Kim}, title = {{THOR}: Secure Transformer Inference with Homomorphic Encryption}, howpublished = {Cryptology {ePrint} Archive, Paper 2024/1881}, year = {2024}, url = {https://eprint.iacr.org/2024/1881} }