Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks -- such as coding -- where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory. The code implementation can be found in https://github.com/DorTsur/HeavyWater_SimplexWater
翻译:大语言模型(LLM)水印技术能够验证文本来源、遏制机器生成文本的滥用,并促进对人工智能系统的信任。现有水印方法通过改变LLM输出的下一词预测概率分布来实现。更新后的(即加水印的)预测依赖于随机辅助信息,例如通过对先前生成词元进行哈希运算产生。在低熵生成任务(如代码生成)中,LLM水印面临特殊挑战——这类任务中下一词预测近乎确定性。本文提出一种水印设计的优化框架,旨在探究如何最有效地利用随机辅助信息,以最大化水印检测概率并最小化生成文本的失真。我们的分析指导了两种新型水印的设计:HeavyWater与SimplexWater。这两种水印均具备可调参数,能在检测精度与文本失真之间实现优雅权衡。它们可应用于任意LLM,且对辅助信息生成方式保持不可知性。我们通过多项基准测试评估HeavyWater与SimplexWater的性能,证明其能在低熵场景下以极小的文本生成质量损失实现高精度水印检测。理论分析还揭示了LLM水印与编码理论之间令人惊奇的新联系。代码实现可在https://github.com/DorTsur/HeavyWater_SimplexWater获取。