In this paper, we study the impact of combining profile and network data in a de-duplication setting. We also assess the influence of a range of prior distributions on the linkage structure. Furthermore, we explore stochastic gradient Hamiltonian Monte Carlo methods as a faster alternative to obtain samples from the posterior distribution for network parameters. Our methodology is evaluated using the RLdata500 data, which is a popular dataset in the record linkage literature.
翻译:在本文中,我们研究了将剖面图和网络数据结合到一个不再重复的环境下的影响,我们还评估了以前一系列分布对联系结构的影响,此外,我们探索了随机梯度梯度汉密尔顿·蒙特卡洛方法,作为从网络参数的后方分布中获得样本的一个更快的替代方法,我们的方法使用RLdata500数据进行评估,这是记录链接文献中流行的数据集。