Real-world applications frequently seek to solve a general form of the Entity Matching (EM) problem to find associated entities. Such scenarios include matching jobs to candidates in job targeting, matching students with courses in online education, matching products with user reviews on e-commercial websites, and beyond. These tasks impose new requirements such as matching data entries with diverse formats or having a flexible and semantics-rich matching definition, which are beyond the current EM task formulation or approaches. In this paper, we introduce the problem of Generalized Entity Matching (GEM) that satisfies these practical requirements and presents an end-to-end pipeline Machop as the solution. Machop allows end-users to define new matching tasks from scratch and apply them to new domains in a step-by-step manner. Machop casts the GEM problem as sequence pair classification so as to utilize the language understanding capability of Transformers-based language models (LMs) such as BERT. Moreover, it features a novel external knowledge injection approach with structure-aware pooling methods that allow domain experts to guide the LM to focus on the key matching information thus further contributing to the overall performance. Our experiments and case studies on real-world datasets from a popular recruiting platform show a significant 17.1% gain in F1 score against state-of-the-art methods along with meaningful matching results that are human-understandable.
翻译:现实世界应用程序经常寻求解决实体匹配问题的一般形式,寻找关联实体。这些情景包括:将工作与候选人匹配,将学生与在线教育课程匹配,将产品与电子商业网站及其他网站的用户审查匹配。这些任务提出了新的要求,如将数据条目与不同格式匹配,或将数据条目与电子商业网站和其他网站的用户审查相匹配,或具有灵活和语义丰富的匹配定义,这些定义超出了当前的EM任务制定或方法。此外,本文件还介绍了通用实体匹配(GEM)问题,它满足了这些实际要求,并提出了最终到最后管道Machop作为解决方案。Machop允许终端用户从零开始界定新的匹配任务,并逐步将其应用到新的领域。Machop将GEM问题作为序列配对分类,以便利用基于变异语言模式的语言理解能力,如BERT。此外,我们介绍了一种新型外部知识注入方法,它使域专家能够指导LM关注关键匹配信息,从而进一步为总体绩效做出贡献。我们进行实验,Fract-Conal-Conal-Creal acal develop laction acal laction acal laction acal-produstration acal destande lax the proforst produfortical lades