Analysing the generalisation capabilities of relation extraction (RE) models is crucial for assessing whether they learn robust relational patterns or rely on spurious correlations. Our cross-dataset experiments find that RE models struggle with unseen data, even within similar domains. Notably, higher intra-dataset performance does not indicate better transferability, instead often signaling overfitting to dataset-specific artefacts. Our results also show that data quality, rather than lexical similarity, is key to robust transfer, and the choice of optimal adaptation strategy depends on the quality of data available: while fine-tuning yields the best cross-dataset performance with high-quality data, few-shot in-context learning (ICL) is more effective with noisier data. However, even in these cases, zero-shot baselines occasionally outperform all cross-dataset results. Structural issues in RE benchmarks, such as single-relation per sample constraints and non-standardised negative class definitions, further hinder model transferability.
翻译:分析关系抽取(RE)模型的泛化能力对于评估其是否学习了稳健的关系模式或依赖于伪相关性至关重要。我们的跨数据集实验发现,RE模型在处理未见数据时表现不佳,即使在相似领域内也是如此。值得注意的是,较高的数据集内性能并不代表更好的可迁移性,反而往往表明模型对数据集特定伪影的过拟合。我们的结果还表明,数据质量而非词汇相似性是实现稳健迁移的关键,且最佳适应策略的选择取决于可用数据的质量:当数据质量较高时,微调能实现最佳的跨数据集性能;而在数据噪声较大时,少样本上下文学习(ICL)更为有效。然而,即使在这些情况下,零样本基线偶尔也能超越所有跨数据集结果。RE基准测试中的结构性问题,如每个样本的单关系约束和非标准化的负类定义,进一步阻碍了模型的可迁移性。