Entity resolution targets at identifying records that represent the same real-world entity from one or more datasets. A major challenge in learning-based entity resolution is how to reduce the label cost for training. Due to the quadratic nature of record pair comparison, labeling is a costly task that often requires a significant effort from human experts. Inspired by recent advances of generative adversarial network (GAN), we propose a novel deep learning method, called ErGAN, to address the challenge. ErGAN consists of two key components: a label generator and a discriminator which are optimized alternatively through adversarial learning. To alleviate the issues of overfitting and highly imbalanced distribution, we design two novel modules for diversity and propagation, which can greatly improve the model generalization power. We have conducted extensive experiments to empirically verify the labeling and learning efficiency of ErGAN. The experimental results show that ErGAN beats the state-of-the-art baselines, including unsupervised, semi-supervised, and unsupervised learning methods.
翻译:由于记录配对比较的四边性质,标签是一项费用高昂的任务,往往需要人类专家作出重大努力。在基因对抗网络(GAN)最新进展的启发下,我们提出了一种新型的深层次学习方法,称为ErGAN,以迎接挑战。ErGAN由两个关键组成部分组成:标签生成器和导师,通过对抗性学习加以优化。为了缓解过度配装和高度不平衡的分布问题,我们设计了两种新的多样性和传播模块,这可以大大改善模型的普及能力。我们进行了广泛的实验,以实验性地核查ErGAN的标签和学习效率。实验结果表明,ErGAN战胜了最先进的基线,包括未经监督、半监督和未经监督的学习方法。