Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaning-bearing units in a sequence of characters. Rather than segmenting the sequence, this model discovers continuous representations of the "objects" in the sequence, using a recently proposed architecture for object discovery in images called Slot Attention. We train our model on different languages and evaluate the quality of the obtained representations with probing classifiers. Our experiments show promising results in the ability of our units to capture meaning at a higher level of abstraction.
翻译:字符不表达意思, 但字符序列是。 我们建议一种不受监督的分布法, 以字符序列来学习抽象的含意单位。 这个模型不是对序列进行分解, 而是在序列中发现“ 对象” 的连续表达方式, 使用最近提议的在图像“ 斯洛特注意” 中发现物体的架构 。 我们用不同的语言来培训我们的模型, 并且用测试分类器来评估获得的演示质量 。 我们的实验显示, 我们的单位有能力在更高的抽象层次上捕捉含义, 其结果很有希望 。