Paper 2024/759

Watermarking Language Models for Many Adaptive Users

Aloni Cohen, University of Chicago
Alexander Hoover, University of Chicago
Gabe Schoenbach, University of Chicago
Abstract

We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness --- that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text "approximates enough blocks" of model-generated output.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint.
Keywords
watermarkinglanguage modelsgenerative AIfingerprinting codes
Contact author(s)
aloni @ g uchicago edu
alexhoover @ uchicago edu
gschoenbach @ uchicago edu
History
2024-06-28: revised
2024-05-17: received
See all versions
Short URL
https://ia.cr/2024/759
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2024/759,
      author = {Aloni Cohen and Alexander Hoover and Gabe Schoenbach},
      title = {Watermarking Language Models for Many Adaptive Users},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/759},
      year = {2024},
      url = {https://eprint.iacr.org/2024/759}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.