Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze; Sanjam Garg; Somesh Jha; Saeed Mahloujifar; Mohammad Mahmoody; Mingyuan Wang

Paper 2023/1661

Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze, University of California, Berkeley

Sanjam Garg

, University of California, Berkeley

Somesh Jha, University of Wisconsin–Madison

Saeed Mahloujifar

, Fundamental Artificial Intelligence Research at Meta

Mohammad Mahmoody

, University of Virginia

Mingyuan Wang

, New York University Shanghai

Abstract

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Metadata

Available format(s): PDF
Category: Applications
Publication info: Published by the IACR in CIC 2024
Keywords: public-detectability watermarking large language models cryptographic protocols provable security machine learning
Contact author(s): fairoze @ berkeley edu
sanjamg @ berkeley edu
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
mingyuan wang @ nyu edu
History: 2025-01-04: last of 4 revisions; 2023-10-26: received; See all versions
Short URL: https://ia.cr/2023/1661
License: CC BY

BibTeX

@misc{cryptoeprint:2023/1661,
      author = {Jaiden Fairoze and Sanjam Garg and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Mingyuan Wang},
      title = {Publicly-Detectable Watermarking for Language Models},
      howpublished = {Cryptology {ePrint} Archive, Paper 2023/1661},
      year = {2023},
      url = {https://eprint.iacr.org/2023/1661}
}