使用训练有素的自动校准器进行跨语言转换的简单几何方法 (A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders)

Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. We investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings.

翻译：为多种语言而培训的有力句子编码器正在上升。这些系统能够将广泛的语言特性嵌入矢量表达中。虽然可以使用明确的检验任务来核实特定语言特性的存在, 但还不清楚矢量表示器是否可以被操纵来间接引导这些特性。我们调查在嵌入空间中使用几何绘图来改变语言特性, 而不对培训前的句子编码器或解码器作任何调整。我们使用预先培训的多语言自动编码器验证我们关于三种语言特性的方法, 并在单一语言和跨语言环境中分析结果。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【ICLR2020】图神经网络与图像处理，微分方程，27页ppt

专知会员服务

48+阅读 · 2020年6月6日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日