A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present Calgacus, a simple and efficient protocol to achieve it. We show that even modest 8-billion-parameter open-source LLMs are sufficient to obtain high-quality results, and a message as long as this abstract can be encoded and decoded locally on a laptop in seconds. The existence of such a protocol demonstrates a radical decoupling of text from authorial intent, further eroding trust in written communication, already shaken by the rise of LLM chatbots. We illustrate this with a concrete scenario: a company could covertly deploy an unfiltered LLM by encoding its answers within the compliant responses of a safe model. This possibility raises urgent questions for AI safety and challenges our understanding of what it means for a Large Language Model to know something.
翻译:一段有意义的文本可以被隐藏于另一段长度相同、内容完全不同但仍保持连贯性与合理性的文本之中。例如,一条包含尖锐政治批评的推文可被嵌入另一条颂扬同一位政治领袖的推文中,或一篇普通产品评论可隐藏秘密手稿。这种离奇现象的实现得益于大型语言模型,本文提出Calgacus——一种实现该目标的简单高效协议。研究表明,即使仅使用80亿参数的开源大型语言模型也能获得高质量结果,且编码解码如本摘要长度的信息在笔记本电脑上仅需数秒。该协议的存在表明文本与作者意图之间出现根本性脱钩,进一步削弱了本就因大型语言模型聊天机器人兴起而动摇的书面通信信任。我们通过具体场景说明:企业可通过将未过滤大型语言模型的回答编码至安全模型的合规响应中,实现隐蔽部署。这种可能性为人工智能安全带来紧迫问题,并挑战我们对大型语言模型“认知”含义的理解。