用于记录链接和重复探测的多文件分割 (Multifile Partitioning for Record Linkage and Duplicate Detection) - 专知论文

会员服务 ·

0

entity · INFORMS · 划分 · Extensibility · 损失函数（机器学习） ·

2021 年 10 月 8 日

Multifile Partitioning for Record Linkage and Duplicate Detection

翻译：用于记录链接和重复探测的多文件分割

Serge Aleshin-Guendel,Mauricio Sadinle

from arxiv, 63 pages

Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this problem have focused on linking two files assumed to be free of duplicates, or on detecting which records in a single file are duplicates. However, it is common in practice to encounter scenarios that fit somewhere in between or beyond these two settings. We propose a Bayesian approach for the general setting of multifile record linkage and duplicate detection. We use a novel partition representation to propose a structured prior for partitions that can incorporate prior information about the data collection processes of the datafiles in a flexible manner, and extend previous models for comparison data to accommodate the multifile setting. We also introduce a family of loss functions to derive Bayes estimates of partitions that allow uncertain portions of the partitions to be left unresolved. The performance of our proposed methodology is explored through extensive simulations. Code implementing the methodology is available at https://github.com/aleshing/multilink .

翻译：在缺乏独特识别资料的情况下,包含重叠实体资料的合并数据档案是一项艰巨的任务,如果有些实体在数据档案中出现重复,则更为复杂。这个问题的多数方法侧重于将假定没有重复资料的两个文件联系起来,或者发现单个文件中的记录是重复的。然而,在实践中,常见的做法是遇到适合这两个环境之间或之外某处的情景。我们提议采用巴耶斯式办法,以总体设定多文件记录链接和重复检测。我们使用新版分区表示法,提出分区结构化的预示法,以灵活的方式纳入关于数据档案数据收集过程的先前信息,并将以前的比较数据模型扩大到多文件设置。我们还采用损失函数组合,得出使分区的不确定部分无法解决的海湾分区估计数。我们拟议方法的绩效通过广泛的模拟加以探讨。在https://github.com/aleshing/multlink上可以找到实施方法的代码。

0

相关内容

entity

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

【CVPR2021】神经结构搜索的相对论性评价

专知会员服务

12+阅读 · 2021年3月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

异常检测论文大列表：方法、应用、综述

异常检测论文大列表：方法、应用、综述

专知

126+阅读 · 2019年7月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

Arxiv

0+阅读 · 2021年11月30日

QMagFace: Simple and Accurate Quality-Aware Face Recognition

Arxiv

0+阅读 · 2021年11月30日

Restoring Negative Information in Few-Shot Object Detection

Arxiv

4+阅读 · 2020年10月26日

Graph Information Bottleneck for Subgraph Recognition

Arxiv

8+阅读 · 2020年10月12日

Learning Memory-guided Normality for Anomaly Detection

Learning Memory-guided Normality for Anomaly Detection

Arxiv

4+阅读 · 2020年3月30日

Joint Learning of Named Entity Recognition and Entity Linking

Arxiv

3+阅读 · 2019年7月18日

Acquisition of Localization Confidence for Accurate Object Detection

Acquisition of Localization Confidence for Accurate Object Detection

Arxiv

4+阅读 · 2018年7月30日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

Zero-Shot Detection

Arxiv

7+阅读 · 2018年3月19日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

损失函数（机器学习）

相关VIP内容

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

【CVPR2021】神经结构搜索的相对论性评价

专知会员服务

12+阅读 · 2021年3月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

美国武装部队面临战车可维护性问题

《5G测试平台：探索5G在军事场景中的赋能平台》

海军无人系统：海上作战的演进而非革命

《未来无人海军系统：海上无人机效能增强与作战升级概览》2025最新93页

相关资讯

异常检测论文大列表：方法、应用、综述

异常检测论文大列表：方法、应用、综述

专知

126+阅读 · 2019年7月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

Arxiv

0+阅读 · 2021年11月30日

QMagFace: Simple and Accurate Quality-Aware Face Recognition

Arxiv

0+阅读 · 2021年11月30日

Restoring Negative Information in Few-Shot Object Detection

Arxiv

4+阅读 · 2020年10月26日

Graph Information Bottleneck for Subgraph Recognition

Arxiv

8+阅读 · 2020年10月12日

Learning Memory-guided Normality for Anomaly Detection

Learning Memory-guided Normality for Anomaly Detection

Arxiv

4+阅读 · 2020年3月30日

Joint Learning of Named Entity Recognition and Entity Linking

Arxiv

3+阅读 · 2019年7月18日

Acquisition of Localization Confidence for Accurate Object Detection

Acquisition of Localization Confidence for Accurate Object Detection

Arxiv

4+阅读 · 2018年7月30日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

Zero-Shot Detection

Arxiv

7+阅读 · 2018年3月19日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员