Watermarking Language Models for Many Adaptive Users

Aloni Cohen; Alexander Hoover; Gabe Schoenbach

Paper 2024/759

Watermarking Language Models for Many Adaptive Users

Aloni Cohen

, University of Chicago

Alexander Hoover

, University of Chicago

Gabe Schoenbach

, University of Chicago

Abstract

We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness --- that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text "approximates enough blocks" of model-generated output.

Metadata

Available format(s): PDF
Category: Applications
Publication info: Preprint.
Keywords: watermarking language models generative AI fingerprinting codes
Contact author(s): aloni @ g uchicago edu
alexhoover @ uchicago edu
gschoenbach @ uchicago edu
History: 2024-06-28: revised; 2024-05-17: received; See all versions
Short URL: https://ia.cr/2024/759
License: CC BY

BibTeX

@misc{cryptoeprint:2024/759,
      author = {Aloni Cohen and Alexander Hoover and Gabe Schoenbach},
      title = {Watermarking Language Models for Many Adaptive Users},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/759},
      year = {2024},
      url = {https://eprint.iacr.org/2024/759}
}