With the fast development of natural language processing, recent advances in information hiding focus on covertly embedding secret information into texts. These algorithms either modify a given cover text or directly generate a text containing secret information, which, however, are not reversible, meaning that the original text not carrying secret information cannot be perfectly recovered unless much side information are shared in advance. To tackle with this problem, in this paper, we propose a general framework to embed secret information into a given cover text, for which the embedded information and the original cover text can be perfectly retrieved from the marked text. The main idea of the proposed method is to use a masked language model to generate such a marked text that the cover text can be reconstructed by collecting the words of some positions and the words of the other positions can be processed to extract the secret information. Our results show that the original cover text and the secret information can be successfully embedded and extracted. Meanwhile, the marked text carrying secret information has good fluency and semantic quality, indicating that the proposed method has satisfactory security, which has been verified by experimental results. Furthermore, there is no need for the data hider and data receiver to share the language model, which significantly reduces the side information and thus has good potential in applications.
翻译:随着自然语言处理的快速发展,在信息隐藏方面的最近进展侧重于秘密信息隐蔽地嵌入文本中。这些算法要么修改给定的封面文本,要么直接生成含有秘密信息的文本,但不可反转,这意味着除非事先分享大量侧面信息,否则无法完全恢复不包含秘密信息的原始文本。为了解决这个问题,我们在本文件中提议了一个总框架,将秘密信息嵌入给定的封面文本,其中嵌入的信息和原始封面文本可以完全从标记文本中检索出来。拟议方法的主要想法是使用隐蔽语言模型生成一个标记的文本,使封面文本能够通过收集某些立场的文字来重建,而其他位置的文字则可以处理,以提取秘密信息。我们的结果表明,原始封面文本和秘密信息可以成功地嵌入和提取。与此同时,带有秘密信息的标记文本具有良好的流畅和语义质量,表明拟议的方法具有令人满意的安全性,已经通过实验结果加以核实。此外,不需要由数据隐藏者和数据接收者来重新构建封面文本,从而能够通过收集某些立场的文字和文字来提取秘密信息。我们的结果表明,原始封面文本和秘密信息的潜力因此大大降低。