This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
翻译:本文介绍了一种用于实体解析算法评估的新方法。它是受PatentsView.org启发的,该软件是一个美国专利和商标局的专利数据探索工具,使用实体解析算法来消除专利发明者的歧义。我们提供了一种数据收集方法和量身定制的性能估计器,可以考虑到抽样偏差。我们的方法简单、实用和原则性强,这些特点使我们能够绘制 PatentsView 解决性能的第一个代表性图像。这种方法用于告诉 PatentsView 的用户数据的可靠性,并允许比较竞争的解析算法。