Paper 2023/1661

Publicly Detectable Watermarking for Language Models

Jaiden Fairoze, University of California, Berkeley
Sanjam Garg, University of California, Berkeley
Somesh Jha, University of Wisconsin–Madison
Saeed Mahloujifar, FAIR, Meta
Mohammad Mahmoody, University of Virginia
Mingyuan Wang, University of California, Berkeley

We construct the first provable watermarking scheme for language models with public detectability or verifiability: we use a private key for watermarking and a public key for watermark detection. Our protocol is the first watermarking scheme that does not embed a statistical signal in generated text. Rather, we directly embed a publicly-verifiable cryptographic signature using a form of rejection sampling. We show that our construction meets strong formal security guarantees and preserves many desirable properties found in schemes in the private-key watermarking setting. In particular, our watermarking scheme retains distortion-freeness and model agnosticity. We implement our scheme and make empirical measurements over open models in the 7B parameter range. Our experiments suggest that our watermarking scheme meets our formal claims while preserving text quality.

Available format(s)
Publication info
public-detectabilitywatermarkinglarge language modelscryptographic protocolsprovable securitymachine learning
Contact author(s)
fairoze @ berkeley edu
sanjamg @ berkeley edu
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
mingyuan @ berkeley edu
2023-10-26: approved
2023-10-26: received
See all versions
Short URL
Creative Commons Attribution


      author = {Jaiden Fairoze and Sanjam Garg and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Mingyuan Wang},
      title = {Publicly Detectable Watermarking for Language Models},
      howpublished = {Cryptology ePrint Archive, Paper 2023/1661},
      year = {2023},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.