We propose the neural string edit distance model for string-pair classification and sequence generation based on learned string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We test the method on cognate detection, transliteration, and grapheme-to-phoneme conversion. We show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we can match the performance of state-of-the-art string-pair classification models. Using static embeddings and a minor modification of the loss function, we can force interpretability, at the expense of an accuracy drop.
翻译:我们建议使用神经字符串编辑远程模式,用于字符串分类和序列生成,其依据是学习到的字符串编辑距离。我们将最初的预期-最大程度的远程算法修改为不同的损失函数,允许我们将其整合到一个神经网络中,提供输入的上下文描述。我们测试关于COgnate检测、转立和图形对手机转换的方法。我们显示,我们可以在一个单一框架中交换性能和可解释性。我们使用难以解释的背景表达,我们可以匹配最先进的字符串分类模型的性能。我们使用静态嵌入和微小修改损失函数,我们可以以精确下降为代价强制解释性。