This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
翻译:本文介绍了实体解析算法的新评价方法,其动机是美国专利和商标局专利数据探索工具PecestsView.org,它利用实体解析算法使专利发明者产生分歧。我们提供了数据收集方法和量身定做的性能估计,以说明抽样偏差。我们的方法简单、实用和有原则 -- -- 关键特征使我们能够描绘PcotsView的模糊性表现的第一个有代表性的图象。这个方法被用来向PcotsView的用户通报数据的可靠性,并可以比较相互竞争的矛盾算法。