Paper 2025/256
Inaccessible Entropy for Watermarking Generative Agents
Abstract
In this work, we construct distortion-free and unforgeable watermarks for language models and generative agents. The watermarked output cannot be forged by a adversary nor removed by the adversary without significantly degrading model output quality. That is, the watermarked output is distortion-free: the watermarking algorithm does not noticeably change the quality of the model output and without the public detection key, no efficient adversary can distinguish output that is watermarked from outputs which are not. The core of the watermarking schemes involve embedding a message and publicly-verifiable digital signature in the generated model output. The message and signature can be extracted during the detection phase and verified by any authorized entity that has a public key. We show that, assuming the standard cryptographic assumption of one-way functions, we can construct distortion-free and unforgeable watermark schemes. Our framework relies on analyzing the inaccessible entropy of the watermarking schemes based on computational entropy notions derived from the existence of one-way functions.
Metadata
- Available format(s)
-
PDF
- Category
- Cryptographic protocols
- Publication info
- Preprint.
- Keywords
- WatermarkingError-Correcting CodesDigital Signatures
- Contact author(s)
-
alabid @ illinois edu
varshney @ illinois edu - History
- 2025-02-18: approved
- 2025-02-17: received
- See all versions
- Short URL
- https://ia.cr/2025/256
- License
-
CC BY
BibTeX
@misc{cryptoeprint:2025/256, author = {Daniel Alabi and Lav R. Varshney}, title = {Inaccessible Entropy for Watermarking Generative Agents}, howpublished = {Cryptology {ePrint} Archive, Paper 2025/256}, year = {2025}, url = {https://eprint.iacr.org/2025/256} }