Paper 2021/966

Soteria: Privacy-Preserving Machine Learning for Apache Spark

Cláudia Brito, Pedro Ferreira, Bernardo Portela, Rui Oliveira, and João Paulo

Abstract

Privacy and security are prime obstacles to the wider adoption of machine learning services offered by cloud computing providers. Namely, trusting users' sensitive data to a third-party infrastructure, vulnerable to both external and internal malicious attackers, restricts many companies from leveraging the scalability and flexibility offered by cloud services. We propose Soteria, a system for distributed privacy-preserving machine learning that combines the Apache Spark system, and its machine learning library (MLlib), with the confidentiality features provided by Trusted Execution Environments (e.g., Intel SGX). Soteria supports two main designs, each offering specific guarantees in terms of security and performance. The first encapsulates most of the computation done by Apache Spark on a secure enclave, thus offering stronger security. The second fine-tunes the Spark operations that must be done at the secure enclave to reduce the needed trusted computing base, and consequently the performance overhead, at the cost of an increased attack surface. An extensive evaluation of Soteria, with classification, regression, dimensionality reduction, and clustering algorithms, shows that our system outperforms state-of-the-art solutions, reducing their performance overhead by up to 41%. Moreover, we show that privacy-preserving machine learning is achievable while providing strong security guarantees.

Metadata
Available format(s)
PDF
Publication info
Preprint. Minor revision.
Keywords
Privacy-preserving Machine LearningApache SparkConfidential ComputingIntel SGX
Contact author(s)
claudia v brito @ inesctec pt
joao t paulo @ inesctec pt
History
2022-02-14: revised
2021-07-22: received
See all versions
Short URL
https://ia.cr/2021/966
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2021/966,
      author = {Cláudia Brito and Pedro Ferreira and Bernardo Portela and Rui Oliveira and João Paulo},
      title = {Soteria: Privacy-Preserving Machine Learning for Apache Spark},
      howpublished = {Cryptology ePrint Archive, Paper 2021/966},
      year = {2021},
      note = {\url{https://eprint.iacr.org/2021/966}},
      url = {https://eprint.iacr.org/2021/966}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.