Experimenting with Zero-Knowledge Proofs of Training

Sanjam Garg; Aarushi Goel; Somesh Jha; Saeed Mahloujifar; Mohammad Mahmoody; Guru-Vamsi Policharla; Mingyuan Wang

Paper 2023/1345

Experimenting with Zero-Knowledge Proofs of Training

Sanjam Garg, UC Berkeley

Aarushi Goel, NTT Research

Somesh Jha, University of Wisconsin–Madison

Saeed Mahloujifar, Meta

Mohammad Mahmoody, University of Virginia

Guru-Vamsi Policharla, UC Berkeley

Mingyuan Wang, UC Berkeley

Abstract

How can a model owner prove they trained their model according to the correct specification? More importantly, how can they do so while preserving the privacy of the underlying dataset and the final model? We study this problem and formulate the notion of zero-knowledge proof of training (zkPoT), which formalizes rigorous security guarantees that should be achieved by a privacy-preserving proof of training. While it is theoretically possible to design zkPoT for any model using generic zero-knowledge proof systems, this approach results in extremely unpractical proof generation times. Towards designing a practical solution, we propose the idea of combining techniques from MPC-in-the-head and zkSNARKs literature to strike an appropriate trade-off between proof size and proof computation time. We instantiate this idea and propose a concretely efficient, novel zkPoT protocol for logistic regression. Crucially, our protocol is streaming-friendly and does not require RAM proportional to the size of the circuit being trained and, hence, can be adapted to the requirements of available hardware. We expect the techniques developed in this paper to also generally be useful for designing efficient zkPoT protocols for other relatively more sophisticated ML models. We implemented and benchmarked prover/verifier runtimes and proof sizes for training a logistic regression model using mini-batch gradient descent on a 4~GB dataset of 262,144 records with 1024 features. We divide our protocol into three phases: (1) data-independent offline phase (2) data-dependent phase that is independent of the model (3) online phase that depends both on the data and the model. The total proof size (across all three phases) is less than of the data set size (~MB). In the online phase, the prover and verifier times are under 10 minutes and half a minute respectively, whereas in the data-dependent phase, they are close to one hour and a few seconds respectively.

Metadata

Available format(s): PDF
Category: Cryptographic protocols
Publication info: Published elsewhere. Major revision. ACM CCS 2023
Keywords: ML SNARKs MPC-in-the-head Proof of Training Logistic Regression Zero-knowledge proof
Contact author(s): sangamg @ berkeley edu
aarushi goel @ ntt-research com
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
guruvamsip @ berkeley edu
mingyuan @ berkeley edu
History: 2023-09-11: approved; 2023-09-08: received; See all versions
Short URL: https://ia.cr/2023/1345
License: CC BY

BibTeX

@misc{cryptoeprint:2023/1345,
      author = {Sanjam Garg and Aarushi Goel and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Guru-Vamsi Policharla and Mingyuan Wang},
      title = {Experimenting with Zero-Knowledge Proofs of Training},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1345},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1345}
}