Paper 2023/1366
Compact Frequency Estimators in Adversarial Environments
Abstract
Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realizations of a compact frequency estimator (CFE). These are a class of probabilistic data structures that maintain a compact summary of (typically) high-volume streaming data, and provides approximately correct estimates of the number of times any particular element has appeared. CFEs are often the base structure in systems looking for the highest-frequency elements (i.e., top-$K$ elements, heavy hitters, elephant flows). Traditionally, probabilistic guarantees on the accuracy of frequency estimates are proved under the implicit assumption that stream elements do not depend upon the internal randomness of the structure. Said another way, they are proved in the presence of data streams that are created by non-adaptive adversaries. Yet in many practical use-cases, this assumption is not well-matched with reality; especially, in applications where malicious actors are incentivized to manipulate the data stream. We show that the CMS and HK structures can be forced to make significant estimation errors, by concrete attacks that exploit adaptivity. We analyze these attacks analytically and experimentally, with tight agreement between the two. Sadly, these negative results seem unavoidable for (at least) sketch-based CFEs with parameters that are reasonable in practice. On the positive side, we give a new CFE (Count-Keeper) that can be seen as a composition of the CMS and HK structures. Count-Keeper estimates are typically more accurate (by at least a factor of two) than CMS for ``honest" streams; our attacks against CMS and HK are less effective (and more resource intensive) when used against Count-Keeper; and Count-Keeper has a native ability to flag estimates that are suspicious, which neither CMS or HK (or any other CFE, to our knowledge) admits.
Metadata
- Available format(s)
- Category
- Applications
- Publication info
- Published elsewhere. Major revision. Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS '23)
- DOI
- 10.1145/3576915.3623216
- Keywords
- probabilistic data structureCount-min sketchHeavyKeeperCount-Keeper
- Contact author(s)
-
smarkelon @ ufl edu
mia filic @ inf ethz ch
teshrim @ ufl edu - History
- 2023-09-25: revised
- 2023-09-12: received
- See all versions
- Short URL
- https://ia.cr/2023/1366
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2023/1366, author = {Sam A. Markelon and Mia Filić and Thomas Shrimpton}, title = {Compact Frequency Estimators in Adversarial Environments}, howpublished = {Cryptology {ePrint} Archive, Paper 2023/1366}, year = {2023}, doi = {10.1145/3576915.3623216}, url = {https://eprint.iacr.org/2023/1366} }