Paper 2019/1158

Practical Privacy-Preserving K-means Clustering

Payman Mohassel, Mike Rosulek, and Ni Trieu

Abstract

Clustering is a common technique for data analysis, which aims to partition data into similar groups. When the data comes from different sources, it is highly desirable to maintain the privacy of each database. In this work, we study a popular clustering algorithm (K-means) and adapt it to the privacy-preserving context. Specifically, to construct our privacy-preserving clustering algorithm, we first propose an efficient batched Euclidean squared distance computation protocol in the adaptive amortizing setting, when one needs to compute the distance from the same point to other points. This protocol can also serve as a key building block in many real-world applications such as Bio-metric Identification. Furthermore, we construct a customized garbled circuit for computing the minimum value among shared values. We implement and evaluate our protocols to demonstrate their practicality and show that they are able to train datasets that are much larger and faster than in the previous work. The numerical results also show that the proposed protocol achieve almost the same accuracy compared to a K-means plain-text clustering algorithm.

Metadata
Available format(s)
PDF
Publication info
Published elsewhere. Minor revision. PETS 2020
Contact author(s)
trieun @ oregonstate edu
History
2020-06-16: last of 5 revisions
2019-10-07: received
See all versions
Short URL
https://ia.cr/2019/1158
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2019/1158,
      author = {Payman Mohassel and Mike Rosulek and Ni Trieu},
      title = {Practical Privacy-Preserving K-means Clustering},
      howpublished = {Cryptology {ePrint} Archive, Paper 2019/1158},
      year = {2019},
      url = {https://eprint.iacr.org/2019/1158}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.