Paper 2025/532

Chunking Attacks on File Backup Services using Content-Defined Chunking

Boris Alexeev
Colin Percival, Tarsnap Backup Inc.
Yan X Zhang, San Jose State University
Abstract

Systems such as file backup services often use content-defined chunking (CDC) algorithms, especially those based on rolling hash techniques, to split files into chunks in a way that allows for data deduplication. These chunking algorithms often depend on per-user parameters in an attempt to avoid leaking information about the data being stored. We present attacks to extract these chunking parameters and discuss protocol-agnostic attacks and loss of security once the parameters are breached (including when these parameters are not setup at all, which is often available as an option). Our parameter-extraction attacks themselves are protocol-specific but their ideas are generalizable to many potential CDC schemes.

Note: Typo fixes.

Metadata
Available format(s)
PDF
Category
Attacks and cryptanalysis
Publication info
Preprint.
Keywords
Content-defined chunkingCDCRolling hashBackup
Contact author(s)
cperciva @ tarsnap com
History
2025-03-24: revised
2025-03-21: received
See all versions
Short URL
https://ia.cr/2025/532
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2025/532,
      author = {Boris Alexeev and Colin Percival and Yan X Zhang},
      title = {Chunking Attacks on File Backup Services using Content-Defined Chunking},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/532},
      year = {2025},
      url = {https://eprint.iacr.org/2025/532}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.