众源加入和实体决议的预期最佳标签标签顺序问题 (The Expected Optimal Labeling Order Problem for Crowdsourced Joins and Entity Resolution)

In the SIGMOD 2013 conference, we published a paper extending our earlier work on crowdsourced entity resolution to improve crowdsourced join processing by exploiting transitive relationships [Wang et al. 2013]. The VLDB 2014 conference has a paper that follows up on our previous work [Vesdapunt et al., 2014], which points out and corrects a mistake we made in our SIGMOD paper. Specifically, in Section 4.2 of our SIGMOD paper, we defined the "Expected Optimal Labeling Order" (EOLO) problem, and proposed an algorithm for solving it. We incorrectly claimed that our algorithm is optimal. In their paper, Vesdapunt et al. show that the problem is actually NP-Hard, and based on that observation, propose a new algorithm to solve it. In this note, we would like to put the Vesdapunt et al. results in context, something we believe that their paper does not adequately do.

翻译：在SIGMOD 2013 年会议上,我们发表了一份文件,扩大了我们早先关于多方源实体解决方案的工作,通过利用中转关系改善多方源联手处理,从而改进多方源联手处理[Wang等人,2013年]。VLDB 2014 年会议有一份后续我们先前工作的文件[Vesdapunt等人,2014年],其中指出并纠正了我们在SIGMOD 文件中的错误。具体地说,在SIGMOD文件第4.2节中,我们定义了“预期最佳标签秩序”问题,并提出了解决这一问题的算法。我们错误地声称我们的算法是最佳的。在他们的论文中,Vesdapunt等人显示,问题实际上是NP-Hard,并以此观察为基础,提出了解决该问题的新算法。在本说明中,我们想将Vesdapunt等人的结果放在背景中,我们认为他们的论文没有适当做到。

相关内容

实体解析

关注 5

不同的数据提供方对同一个事物即实体 (Entity)可能会有不同的描述 (这里的描述包括数据格式、表示方法等) ，每一个对实体的描述称为该实体的一个引用。实体解析，是指从一个“ 引用集合”中解析并映射到现实世界中的“ 实体”过程。实体解析(Entity Resolution)又被称为记录链接(Record Linkage) 、对象识别(object Identification ) 、个体识别(Individual Identification) 、重复检测(Duplicate Detection)

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日