We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. However, the word embeddings trained under the distributional hypothesis are not grounded, as they are derived purely from a text corpus. In this letter, we transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences. We extend a bidirectional translation model for actions and descriptions by incorporating non-linear layers that retrofit the word embeddings. By training the retrofit layer and the bidirectional translation model alternately, our proposed model is able to transform the pre-trained word embeddings to adapt to a paired action-description dataset. Our results demonstrate that the embeddings of synonyms form a semantic cluster by reflecting the experiences (actions and environments) of a robot. These embeddings allow the robot to properly generate actions from unseen words that are not paired with actions in a dataset.
翻译:我们提议了一个有希望的神经网络模型, 以获得对机器人动作及其语言描述的有根有底的描述。 正确回应各种语言表达方式, 包括多词词, 是机器人通过语言对话与人互动的一个重要能力。 先前的研究显示, 机器人可以通过使用预先训练的单词嵌入来使用行动描述配对数据集中未包含的词。 但是, 在分布假设下培训的字嵌入不以纯来自文本内容的文字为基础。 在信中, 我们通过使用机器人感官- 运动经验, 将预先训练的字嵌入转换为包含的词。 我们扩展了一个双向翻译模型, 将非线化的字嵌入层用于修改词嵌入的词。 通过培训翻新版层和双向翻译模型, 我们的拟议模型能够将预先训练过的词嵌入不以适应对齐动作数据集。 我们的结果显示, 将同义拼嵌的词嵌入成成一个隐性组合组合组合组合组合组合, 而不是将机器人的动作( 动作和动作) 引入一个固定的机器人。