We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of \emph{machine-machine} communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming.
翻译:我们分析一个受过强化学习培训的代理人所学的语言,作为主动QA系统的一部分[Buck等人,2017年]。在主动QA中,问答被设计成一个强化学习任务,该代理人坐在用户和黑盒问答系统之间。该代理人学会重新定义用户的问题,以获得最佳答案。它用一个从序列到序列问题重拟模型产生的问题的许多版本来探测系统,然后汇总返回的证据以找到最佳答案。这是一个\emph{机器-机器-通讯的范例。重订问题模式必须调整其语言,以提高回复答案的质量,与问题解答系统的语言匹配。我们发现该代理人不学习与语义直觉相一致的转变,而是通过学习传统信息检索技术来发现,例如 tf-idf 重新加权和遏制。