Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed skepticism that neural ranking models were actually improving ad hoc retrieval effectiveness in limited data scenarios. He provided anecdotal evidence that authors of neural IR papers demonstrate "wins" by comparing against weak baselines. This paper provides a rigorous evaluation of those claims in two ways: First, we conducted a meta-analysis of papers that have reported experimental results on the TREC Robust04 test collection. We do not find evidence of an upward trend in effectiveness over time. In fact, the best reported results are from a decade ago and no recent neural approach comes close. Second, we applied five recent neural models to rerank the strong baselines that Lin used to make his arguments. A significant improvement was observed for one of the models, demonstrating additivity in gains. While there appears to be merit to neural IR approaches, at least some of the gains reported in the literature appear illusory.
翻译:Lin在近期的SIGIR论坛文章中表示怀疑神经排位模型实际上在有限数据情景中提高了临时检索的有效性。他提供了传闻证据,证明神经排位模型的作者通过比较薄弱的基线显示了“双赢 ” 。本文从两个方面对这些主张进行了严格的评估:第一,我们对报告了TREC Robust04测试收集实验结果的文件进行了元分析;我们没有发现一段时间内有效性呈上升趋势的证据。事实上,报告的最佳结果来自十年前,而最近没有出现神经排位方法。第二,我们应用了五个最近的神经排位模型来重新排列Lin用来提出论点的强势基线。观察到其中一个模型有重大改进,显示了收益的相加性。虽然神经排位方法似乎有价值,但至少文献中报告的一些收益似乎没有意义。