Paper 2018/289

Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Phillipp Schoppmann, Lennart Vogelsang, Adrià Gascón, and Borja Balle

Abstract

Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to $k$-nearest neighbors (\knn) classification. Due to its non-parametric nature, \knn presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Published elsewhere. Proceedings on Privacy-Enhancing Technologies 2020 (2)
DOI
10.2478/popets-2020-0024
Keywords
text analysisdocument similaritymulti-party computationdifferential privacy
Contact author(s)
schoppmann @ informatik hu-berlin de
History
2020-04-18: revised
2018-03-28: received
See all versions
Short URL
https://ia.cr/2018/289
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2018/289,
      author = {Phillipp Schoppmann and Lennart Vogelsang and Adrià Gascón and Borja Balle},
      title = {Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue},
      howpublished = {Cryptology {ePrint} Archive, Paper 2018/289},
      year = {2018},
      doi = {10.2478/popets-2020-0024},
      url = {https://eprint.iacr.org/2018/289}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.