Deciphering historical substitution ciphers is a challenging problem. Example problems that have been previously studied include detecting cipher type, detecting plaintext language, and acquiring the substitution key for segmented ciphers. However, attacking unsegmented, space-free ciphers is still a challenging task. Segmentation (i.e. finding substitution units) is the first step towards cracking those ciphers. In this work, we propose the first automatic methods to segment those ciphers using Byte Pair Encoding (BPE) and unigram language models. Our methods achieve an average segmentation error of 2\% on 100 randomly-generated monoalphabetic ciphers and 27\% on 3 real homophonic ciphers. We also propose a method for solving non-deterministic ciphers with existing keys using a lattice and a pretrained language model. Our method leads to the full solution of the IA cipher; a real historical cipher that has not been fully solved until this work.
翻译:解密历史替代密码是一个棘手的问题。 先前研究过的问题包括检测密码类型、 探测普通文本语言和获取分解密码的替代密钥。 然而, 攻击未分解的、 无空间的密钥仍是一项艰巨的任务。 分割( 找到替代单位) 是破解这些密钥的第一步 。 在这项工作中, 我们建议了第一个用 Byte Pair Encoding (BPE) 和 unigram 语言 模式分割这些密钥的自动方法 。 我们的方法在100 个随机生成的单发密码和 3个实际同音密码上平均分解了 2 ⁇ 和 27 ⁇ 。 我们还提出了一个方法来解决非定义的密钥, 并使用 Lattic 和 预先训练的语言模式。 我们的方法可以找到 IA 密码的完整解答; 一个真正的历史密码, 直到这项工作还没有完全解答 。