Some actions must be executed in different ways depending on the context. For example, wiping away marker requires vigorous force while wiping away almonds requires more gentle force. In this paper we provide a model where an agent learns which manner of action execution to use in which context, drawing on evidence from trial and error and verbal corrections when it makes a mistake (e.g., ``no, gently''). The learner starts out with a domain model that lacks the concepts denoted by the words in the teacher's feedback; both the words describing the context (e.g., marker) and the adverbs like ``gently''. We show that through the the semantics of coherence, our agent can perform the symbol grounding that's necessary for exploiting the teacher's feedback so as to solve its domain-level planning problem: to perform its actions in the current context in the right way.
翻译:某些行动必须根据上下文以不同方式执行。 例如, 擦除标记需要强力, 而擦除杏仁需要更温和的力量。 在本文中, 我们提供了一个模式, 使一个代理人能够学习何种行动执行方式来使用上下文, 借鉴来自试验和错误的证据, 并在错误发生时进行口头更正( 例如, "uno, sweep' ) 。 学习者首先使用一个域模式, 缺乏教师反馈中用词表示的概念; 描述上下文( 例如标记) 和动词( 比如“ pently ” ) 的词。 我们通过一致性的语义来显示, 我们的代理人可以执行符号基础, 这是利用教师反馈解决其域级规划问题所必需的 : 以正确的方式在目前的背景下执行它的行动 。