In this paper, we investigate the practical challenges of using reinforcement learning agents for question-answering over knowledge graphs. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of questions that were previously answered correctly.
翻译:在本文中,我们调查了使用强化学习剂回答知识图表问题的实际挑战。我们研究了最先进的系统所使用的业绩衡量标准,确定这些系统不够充分。更具体地说,这些系统没有正确评价系统,因为没有答案,因此,这些衡量标准的最佳代理在树立信心方面表现不佳。我们引入了一个简单的新的业绩衡量标准,用于评价问答剂,更能反映实际使用条件,并优化这一衡量标准,将以前工作中使用的二进制奖励结构扩大为长期奖励结构,同时奖励代理人不回答问题,而不是给予错误答复。我们表明,这可以大幅提高被回答问题的准确性,而不会回答以前正确回答的有限问题。