End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

Gauri Gupta; Krithika Ramesh; Anwesh Bhattacharya; Divya Gupta; Rahul Sharma; Nishanth Chandran; Rijurekha Sen

Paper 2023/1010

End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

Gauri Gupta, Massachusetts Institute of Technology

Krithika Ramesh, Johns Hopkins University

Anwesh Bhattacharya

, Microsoft Research (India)

Divya Gupta, Microsoft Research (India)

Rahul Sharma, Microsoft Research (India)

Nishanth Chandran, Microsoft Research (India)

Rijurekha Sen, Indian Institute of Technology Delhi

Abstract

Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques. Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-toend system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic security

Metadata

Available format(s): PDF
Category: Applications
Publication info: Published elsewhere. Privacy Enhancing Technologies Symposium 2023
Keywords: MPC pollution machine learning training
Contact author(s): gauri @ mit edu
kramesh3 @ jh edu
t-anweshb @ microsoft com
divya gupta @ microsoft com
rahsha @ microsoft com
nichandr @ microsoft com
riju @ cse iitd ac in
History: 2023-07-04: last of 2 revisions; 2023-06-29: received; See all versions
Short URL: https://ia.cr/2023/1010
License: CC BY

BibTeX

@misc{cryptoeprint:2023/1010,
      author = {Gauri Gupta and Krithika Ramesh and Anwesh Bhattacharya and Divya Gupta and Rahul Sharma and Nishanth Chandran and Rijurekha Sen},
      title = {End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1010},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1010}
}