Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. We investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings.
翻译:为多种语言而培训的有力句子编码器正在上升。 这些系统能够将广泛的语言特性嵌入矢量表达中。 虽然可以使用明确的检验任务来核实特定语言特性的存在, 但还不清楚矢量表示器是否可以被操纵来间接引导这些特性。 我们调查在嵌入空间中使用几何绘图来改变语言特性, 而不对培训前的句子编码器或解码器作任何调整。 我们使用预先培训的多语言自动编码器验证我们关于三种语言特性的方法, 并在单一语言和跨语言环境中分析结果 。