Fission: Distributed Privacy-Preserving Large Language Model Inference

Mehmet Ugurbil; Dimitris Mouris; Manuel B. Santos; José Cabrero-Holgueras; Miguel de Vega; Shubho Sengupta

Paper 2025/653

Fission: Distributed Privacy-Preserving Large Language Model Inference

Mehmet Ugurbil

, Nillion

Dimitris Mouris

, Nillion

Manuel B. Santos

, Nillion

José Cabrero-Holgueras

, Nillion

Miguel de Vega, Nillion

Shubho Sengupta, Meta

Abstract

The increased popularity of large language models (LLMs) raises serious privacy concerns, where users' private queries are sent to untrusted servers. Many cryptographic techniques have been proposed to provide privacy, such as secure multiparty computation (MPC), which enables the evaluation of LLMs directly on private data. However, cryptographic techniques have been deemed impractical as they introduce large communication and computation. On the other hand, many obfuscation techniques have been proposed, such as split inference, where part of the model is evaluated on edge devices to hide the input data from untrusted servers, but these methods provide limited privacy guarantees. We propose Fission, a privacy-preserving framework that improves latency while providing strong privacy guarantees. Fission utilizes an MPC network for linear computations, while nonlinearities are computed on a separate evaluator network that receives shuffled values in the clear and returns nonlinear functions evaluated at these values back to the MPC network. As a result, each evaluator only gets access to parts of the shuffled data, while the model weights remain private. We evaluate fission on a wide set of LLMs and compare it against prior works. Fission results in up to eight times faster inference and eight times reduced bandwidth compared to prior works while retaining high accuracy. Finally, we construct an attack on obfuscation techniques from related works that show significant information leakage, and we demonstrate how Fission enhances privacy.

Metadata

Available format(s): PDF
Category: Implementation
Publication info: Preprint.
Keywords: Applied Cryptography Large Language Models Machine Learning Multiparty Computation
Contact author(s): memo @ nillion com
dimitris @ nillion com
manuel santos @ nillion com
jose cabrero @ nillion com
miguel @ nillion com
shubho @ gmail com
History: 2025-04-13: approved; 2025-04-09: received; See all versions
Short URL: https://ia.cr/2025/653
License: CC BY

BibTeX

@misc{cryptoeprint:2025/653,
      author = {Mehmet Ugurbil and Dimitris Mouris and Manuel B. Santos and José Cabrero-Holgueras and Miguel de Vega and Shubho Sengupta},
      title = {Fission: Distributed Privacy-Preserving Large Language Model Inference},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/653},
      year = {2025},
      url = {https://eprint.iacr.org/2025/653}
}