BumbleBee: Secure Two-party Inference Framework for Large Transformers

Wen-jie Lu; Zhicong Huang; Zhen Gu; Jingyu Li; Jian Liu; Cheng Hong; Kui Ren; Tao Wei; WenGuang Chen

Paper 2023/1678

BumbleBee: Secure Two-party Inference Framework for Large Transformers

Wen-jie Lu

, Ant Group, Zhejiang University

Zhicong Huang, Ant Group

Zhen Gu, Alibaba Group (China)

Jingyu Li, Ant Group, Zhejiang University

Jian Liu, Zhejiang University

Cheng Hong, Ant Group

Kui Ren, Zhejiang University

Tao Wei, Ant Group

WenGuang Chen, Ant Group

Abstract

Abstract—Large transformer-based models have realized state- of-the-art performance on lots of real-world tasks such as natural language processing and computer vision. However, with the increasing sensitivity of the data and tasks they handle, privacy has become a major concern during model deployment. In this work, we focus on private inference in two-party settings, where one party holds private inputs and the other holds the model. We introduce BumbleBee, a fast and communication-friendly two- party private transformer inference system. Our contributions are three-fold: First, we propose optimized protocols for matrix multiplication, which significantly reduce communication costs by 80% – 90% compared to previous techniques. Secondly, we develop a methodology for constructing efficient protocols tailored to the non-linear activation functions employed in transformer models. The proposed activation protocols have realized a significant enhancement in processing speed, alongside a remarkable reduction in communication costs by 80% – 95% compared with two prior methods. Lastly, we have performed extensive benchmarks on five transformer models. BumbleBee demonstrates its capability by evaluating the LLaMA-7B model, generating one token in approximately 14 minutes using CPUs. Our results further reveal that BumbleBee outperforms Iron (NeurIPS22) by over an order of magnitude and is three times faster than BOLT (Oakland24) with one-tenth communication.

Metadata

Available format(s): PDF
Category: Cryptographic protocols
Publication info: Published elsewhere. Minor revision. Network and Distributed System Security (NDSS) Symposium
Keywords: secure neural inference secure two-party computation privacy-preserving machine learning
Contact author(s): fionser @ gmail com
ljy404490 @ antgroup com
vince hc @ antgroup com
History: 2024-07-08: last of 2 revisions; 2023-10-30: received; See all versions
Short URL: https://ia.cr/2023/1678
License: CC BY-NC

BibTeX

@misc{cryptoeprint:2023/1678,
      author = {Wen-jie Lu and Zhicong Huang and Zhen Gu and Jingyu Li and Jian Liu and Cheng Hong and Kui Ren and Tao Wei and WenGuang Chen},
      title = {{BumbleBee}: Secure Two-party Inference Framework for Large Transformers},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1678},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1678}
}