Cryptology ePrint Archive: Report 2019/1158

Practical Privacy-Preserving K-means Clustering

Payman Mohassel and Mike Rosulek and Ni Trieu

Abstract: Clustering is a common technique for data analysis, which aims to partition data into similar groups. When the data comes from different sources, it is highly desirable to maintain the privacy of each database. In this work, we study a popular clustering algorithm (K-means) and adapt it to the privacy-preserving context.

Our main contributions are to propose: i) communication-efficient protocols for secure two-party multiplication, and ii) batched Euclidean squared distance in the adaptive amortizing setting, when one needs to compute the distance from the same point to other points. These protocols are the key building blocks in many real-world applications such as Bio-metric Identification. Furthermore, we construct a customized garbled circuit for computing the minimum value among shared values.

We implement and evaluate our protocols to demonstrate their practicality and show that they are able to train data-sets that are much larger than in the previous work. For example, our scheme can partition the data-sets of size 100,000 into 5 groups under one hour. The numerical results also show that the proposed protocol reaches a ratio of 91.68% accuracy compared to a K-means plain-text clustering algorithm.

Category / Keywords:

Date: received 5 Oct 2019

Contact author: trieun at oregonstate edu

Available format(s): PDF | BibTeX Citation

Version: 20191007:082525 (All versions of this report)

Short URL:

[ Cryptology ePrint archive ]