In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.
翻译:在本文中,我们调查了使用强化学习剂对真实世界应用的知识图解进行问答的挑战。我们研究了最先进的系统使用的业绩计量,确定这些系统对此类环境不合适。更具体地说,这些系统没有正确评价没有答案的系统,因此,这些指标的优化剂缺乏建模信心。我们引入了一个简单的新的业绩计量,用于评价问答剂,更能反映实际使用条件,并优化这一指标,将以前工作中使用的二元奖赏结构扩展为永久奖赏结构,同时奖励代理人不回答问题,而不是给出错误的答案。我们表明,这样可以大大提高被回答问题的准确性,而只是不回答数量有限的先前正确回答的问题。我们采用有监督的学习战略,通过深度搜索路径将强化学习算法捆绑起来,进一步提高业绩。