Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, \cater can effectively identify the IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.
翻译:先前的作品证实,通过模仿攻击,可窃取文本生成 API, 从而导致IP违反。为了保护文本生成 API 的IP,最近的一项工作引入了水标记算法,并使用无伪证测试作为模仿模型后热量所有权的验证。然而,我们发现,通过对候选水标记单词频率的充分统计来检测这些水标记是可能的。为了解决这一缺陷,我们在本文件中提出了一个保护文本生成 API 的IP 的新型有条件 WATER 框架(CATER ) 。我们建议采用优化方法来决定水标记规则,以最大限度地减少整体单词分布的扭曲,同时最大限度地改变有条件的单词选择模式。理论上,我们证明,即使Savviest攻击者(他们知道 CATER 是如何工作的 ) 也不可能在统计检查的基础上从大量潜在配对的词库中揭示使用的水标记。 我们观察到,高质条件导致可疑的指数性增长(未加固化) IMI质量, 使得我们造型的造型的造价能确定我们造价的受害者。