Paper 2023/1523

On the Privacy of Sublinear-Communication Jaccard Index Estimation via Min-hash Sketching

Seung Geol Choi, United States Naval Academy
Dana Dachman-Soled, University of Maryland, College Park
Mingyu Liang, University of Maryland, College Park, George Washington University
Linsheng Liu, George Washington University
Arkady Yerukhimovich, George Washington University
Abstract

The min-hash sketch is a well-known technique for low-communication approximation of the Jaccard index between two input sets. Moreover, there is a folklore belief that min-hash sketch based protocols protect the privacy of the inputs. In this paper, we investigate this folklore to quantify the privacy of the min-hash sketch. We begin our investigation by considering the privacy of min-hash in a centralized setting where the hash functions are chosen by the min-hash functionality and are unknown to the participants. We show that in this case the min-hash output satisfies the standard definition of differential privacy (DP) without any additional noise. This immediately yields a privacy-preserving sublinear-communication semi-honest 2-PC protocol based on FHE where the hash function is evaluated homomorphically. To improve the efficiency of this protocol, we next consider an implementation in the random oracle model. Here, the protocol participants jointly sample public prefixes for domain separation of the random oracle, and locally evaluate the resulting hash functions on their input sets. Unfortunately, we show that in this public hash function setting, the min-hash output is no longer DP. We therefore consider the notion of distributional differential privacy (DDP) introduced by Bassily et al.~(FOCS 2013). We show that if the honest party's set has sufficiently high min-entropy then the output of the min-hash functionality achieves DDP, again without any added noise. This yields a more efficient semi-honest two-party protocol in the random oracle model, where parties first locally hash their input sets and then perform a 2PC for comparison. By proving that our protocols satisfy DP and DDP respectively, our results formally confirm and qualify the folklore belief that min-hash based protocols protect the privacy of their inputs.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Preprint.
Keywords
Differential PrivacyMPCSublinear communicationsketchingMin-hashJaccard index
Contact author(s)
choi @ usna edu
danadach @ umd edu
mliang5 @ gmu edu
lls @ gwu edu
arkady @ email gwu edu
History
2024-02-15: last of 2 revisions
2023-10-06: received
See all versions
Short URL
https://ia.cr/2023/1523
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2023/1523,
      author = {Seung Geol Choi and Dana Dachman-Soled and Mingyu Liang and Linsheng Liu and Arkady Yerukhimovich},
      title = {On the Privacy of Sublinear-Communication Jaccard Index Estimation via Min-hash Sketching},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1523},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1523}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.