Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.
翻译:数十年来对数据整合进行了广泛研究,从不同角度进行了探讨,然而,这个领域在很大程度上仍以规则为驱动,缺乏普遍自动化;最近机器学习的发展,特别是深层学习的发展,为更普遍、更高效地解决数据整合任务开辟了道路;在本文件中,我们展示了一种通过利用实体关系和背景信息进行建模和整合的方法;这是通过将剪辑和图形神经网络结合起来,在关联实体之间有效传播信息和支持高度可扩缩性实现的;我们评估了我们整合商业实体数据的任务的方法,表明它优于传统的基于规则的系统和其他深层学习方法。