Paper 2023/1366

Compact Frequency Estimators in Adversarial Environments

Sam A. Markelon, University of Florida
Mia Filić, ETH Zurich
Thomas Shrimpton, University of Florida
Abstract

Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realizations of a compact frequency estimator (CFE). These are a class of probabilistic data structures that maintain a compact summary of (typically) high-volume streaming data, and provides approximately correct estimates of the number of times any particular element has appeared. CFEs are often the base structure in systems looking for the highest-frequency elements (i.e., top-$K$ elements, heavy hitters, elephant flows). Traditionally, probabilistic guarantees on the accuracy of frequency estimates are proved under the implicit assumption that stream elements do not depend upon the internal randomness of the structure. Said another way, they are proved in the presence of data streams that are created by non-adaptive adversaries. Yet in many practical use-cases, this assumption is not well-matched with reality; especially, in applications where malicious actors are incentivized to manipulate the data stream. We show that the CMS and HK structures can be forced to make significant estimation errors, by concrete attacks that exploit adaptivity. We analyze these attacks analytically and experimentally, with tight agreement between the two. Sadly, these negative results seem unavoidable for (at least) sketch-based CFEs with parameters that are reasonable in practice. On the positive side, we give a new CFE (Count-Keeper) that can be seen as a composition of the CMS and HK structures. Count-Keeper estimates are typically more accurate (by at least a factor of two) than CMS for ``honest" streams; our attacks against CMS and HK are less effective (and more resource intensive) when used against Count-Keeper; and Count-Keeper has a native ability to flag estimates that are suspicious, which neither CMS or HK (or any other CFE, to our knowledge) admits.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Published elsewhere. Major revision. Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23)
DOI
10.1145/3576915.3623216
Keywords
probabilistic data structureCount-min sketchHeavyKeeperCount-Keeper
Contact author(s)
smarkelon @ ufl edu
mia filic @ inf ethz ch
teshrim @ ufl edu
History
2023-09-25: revised
2023-09-12: received
See all versions
Short URL
https://ia.cr/2023/1366
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2023/1366,
      author = {Sam A. Markelon and Mia Filić and Thomas Shrimpton},
      title = {Compact Frequency Estimators in Adversarial Environments},
      howpublished = {Cryptology ePrint Archive, Paper 2023/1366},
      year = {2023},
      doi = {10.1145/3576915.3623216},
      note = {\url{https://eprint.iacr.org/2023/1366}},
      url = {https://eprint.iacr.org/2023/1366}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.