Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine learning subcommunities. In this paper, we dig into these dynamics. We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity/access within the field.
翻译:基准数据集在机构学习研究的组织中发挥着核心作用,它们围绕共同研究问题协调研究人员,并充当实现共同目标的一个进展衡量尺度。尽管这一领域的基准做法具有基本作用,但相对较少注意在机器学习次社区内部或之间使用和再利用基准数据集的动态。在本文中,我们挖掘这些动态。我们研究了从2015-2020年到2015-2020年各机器学习次社区之间和不同时间,数据集使用模式如何不同。我们发现,任务社区内部日益集中于数量越来越少的数据集,大量采用其他任务中的数据集,并在外地集中关注少数精英机构内的研究人员引进的数据集。我们的结果对科学评估、AI道德以及实地的公平/存取产生影响。