Entity matching is a fundamental task in data cleaning and data integration. With the rapid adoption of large language models (LLMs), recent studies have explored zero-shot and few-shot prompting to improve entity matching accuracy. However, most existing approaches rely on single-step prompting and offer limited investigation into structured reasoning strategies. In this work, we investigate how to enhance LLM-based entity matching by decomposing the matching process into multiple explicit reasoning stages. We propose a three-step framework that first identifies matched and unmatched tokens between two records, then determines the attributes most influential to the matching decision, and finally predicts whether the records refer to the same real-world entity. In addition, we explore a debate-based strategy that contrasts supporting and opposing arguments to improve decision robustness. We evaluate our approaches against multiple existing baselines on several real-world entity matching benchmark datasets. Experimental results demonstrate that structured multi-step reasoning can improve matching performance in several cases, while also highlighting remaining challenges and opportunities for further refinement of reasoning-guided LLM approaches.
翻译:实体匹配是数据清洗与数据集成中的基础任务。随着大语言模型的快速普及,近期研究探索了零样本与少样本提示学习以提升实体匹配精度。然而,现有方法多依赖单步提示策略,对结构化推理机制的探索较为有限。本研究旨在通过将匹配过程分解为多个显式推理阶段,以增强基于大语言模型的实体匹配性能。我们提出一个三步框架:首先识别两条记录间匹配与非匹配的词汇单元,继而判定对匹配决策最具影响力的属性特征,最终预测两条记录是否指向现实世界中的同一实体。此外,我们探索了一种基于辩论的策略,通过对比支持性与反对性论据以提升决策鲁棒性。我们在多个真实世界实体匹配基准数据集上,将所提方法与现有基线模型进行对比评估。实验结果表明,结构化多步推理在多种场景下能有效提升匹配性能,同时也揭示了推理引导的大语言模型方法仍需面对的挑战与进一步优化的机遇。