Paper 2024/1091

MatcHEd: Privacy-Preserving Set Similarity based on MinHash

Rostin Shokri, University of Delaware
Charles Gouert, University of Delaware
Nektarios Georgios Tsoutsos, University of Delaware
Abstract

Fully homomorphic encryption (FHE) enables arbitrary computation on encrypted data, but certain applications remain prohibitively expensive in the encrypted domain. As a case in point, comparing two encrypted sets of data is extremely computationally expensive due to the large number of comparison operators required. In this work, we propose a novel methodology for encrypted set similarity inspired by the MinHash algorithm and the CGGI FHE scheme. Doing comparisons in FHE requires comparators and multiplexers or an expensive approximation, which further increases the latency, especially when the goal is to compare two sets of data. The MinHash algorithm can significantly reduce the number of comparisons required by employing a special Carter-Wegman (CW) hash function as a key building block. However, the modulus operation in the CW hash becomes another key bottleneck because the encrypted sub-circuits required to perform the modular reduction are very large and inefficient in an FHE setting. Towards that end, we introduce an efficient bitwise FHE-friendly digest function (FFD) to employ as the cornerstone of our proposed encrypted set-similarity. In a Boolean FHE scheme like CGGI, the bitwise operations can be implemented efficiently with Boolean gates, which allows for faster evaluation times relative to standard Carter-Wegman constructions. Overall, our approach drastically reduces the number of comparisons required relative to the baseline approach of directly computing the Jaccard similarity coefficients, and is inherently parallelizable, allowing for efficient encrypted computation on multi-CPU and GPU-based cloud servers. We validate our approach by performing a privacy-preserving plagiarism detection across encrypted documents.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint.
Keywords
Secure computationFully homomorphic encryptionMinHashPrivacy-preserving set similarity
Contact author(s)
rostinsh @ udel edu
cgouert @ udel edu
tsoutsos @ udel edu
History
2024-07-05: approved
2024-07-04: received
See all versions
Short URL
https://ia.cr/2024/1091
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2024/1091,
      author = {Rostin Shokri and Charles Gouert and Nektarios Georgios Tsoutsos},
      title = {{MatcHEd}: Privacy-Preserving Set Similarity based on {MinHash}},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1091},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1091}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.