Although virtual agents are increasingly situated in environments where natural language is the most effective mode of interaction with humans, these exchanges are rarely used as an opportunity for learning. Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding: semantic parsers built on top of fixed object categories are precise but inflexible and end-to-end models are maximally expressive, but fickle and opaque. Our goal is to develop a system that balances the strengths of each approach so that users can teach agents new instructions that generalize broadly from a single example. We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model that can affect the meaning of the label in context. Starting from a core programming language that operates over abstructions, users can define increasingly complex mappings from natural language to actions. We show that with this method a user population is able to build a semantic parser for an open-ended house modification task in Minecraft. The semantic parser that results is both flexible and expressive: the percentage of utterances sourced from redefinitions increases steadily over the course of 191 total exchanges, achieving a final value of 28%.
翻译:尽管虚拟代理器日益位于自然语言是与人类互动的最有效模式的环境中,但这些交流很少被用作学习的机会。有效利用语言互动需要解决两种最常见的语言定位方法的局限性:固定对象类别上方的语义分析器精确但不灵活和端对端模型具有最大表达性,但变化和不透明。我们的目标是开发一个平衡每种方法优势的系统,使用户能够教授从一个实例中广泛概括的新指令。我们引入了神经构造概念:一套限制标签固定型变异模型的推断程序的限制,该模型可以影响标签标识在背景中的含义。从一个使用缩略图的核心编程语言开始,用户可以定义从自然语言到行动的日益复杂的映射图。我们表明,使用这种方法的用户群体能够建立一个语义分析器,用于在地雷工艺中进行开放式的房屋修改任务。我们引入了神经构造概念设计器:一套带有标签条件的基因化模型的推理程序,其推导程序限制可能会影响标签在背景中的含义。从使用缩图解的精度开始,用户可以定义从自然语言到行动。我们显示,用户群体能够用这种方法为在地雷工艺的开放式房屋改造任务中建立一种修饰。我们最后的翻定了最后的顺序中,最终的翻取了一次的定位。