Inaccessible Entropy for Watermarking Generative Agents

Daniel Alabi; Lav R. Varshney

Paper 2025/256

Inaccessible Entropy for Watermarking Generative Agents

Daniel Alabi

, Columbia University

Lav R. Varshney

, University of Illinois Urbana-Champaign

Abstract

In this work, we construct distortion-free and unforgeable watermarks for language models and generative agents. The watermarked output cannot be forged by a adversary nor removed by the adversary without significantly degrading model output quality. That is, the watermarked output is distortion-free: the watermarking algorithm does not noticeably change the quality of the model output and without the public detection key, no efficient adversary can distinguish output that is watermarked from outputs which are not. The core of the watermarking schemes involve embedding a message and publicly-verifiable digital signature in the generated model output. The message and signature can be extracted during the detection phase and verified by any authorized entity that has a public key. We show that, assuming the standard cryptographic assumption of one-way functions, we can construct distortion-free and unforgeable watermark schemes. Our framework relies on analyzing the inaccessible entropy of the watermarking schemes based on computational entropy notions derived from the existence of one-way functions.

Metadata

Available format(s): PDF
Category: Cryptographic protocols
Publication info: Preprint.
Keywords: Watermarking Error-Correcting Codes Digital Signatures
Contact author(s): alabid @ illinois edu
varshney @ illinois edu
History: 2025-02-18: approved; 2025-02-17: received; See all versions
Short URL: https://ia.cr/2025/256
License: CC BY

BibTeX

@misc{cryptoeprint:2025/256,
      author = {Daniel Alabi and Lav R. Varshney},
      title = {Inaccessible Entropy for Watermarking Generative Agents},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/256},
      year = {2025},
      url = {https://eprint.iacr.org/2025/256}
}