机器学习研究中编码的价值 (The Values Encoded in Machine Learning Research)

from arxiv, Data and code available at https://github.com/wagnew3/The-Values-Encoded-in-Machine-Learning-Research. arXiv admin note: text overlap with arXiv:2206.04179

Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15\%) and far fewer discuss negative potential (1\%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty. We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power.Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

翻译：机械学习目前对世界产生了超大的影响,日益影响到机构做法和受影响的社区。因此,我们必须质疑模糊的实地概念,认为其价值中立或普遍受益,并调查该领域正在推进的具体价值观。在本文件中,我们首先采用一种方法和说明办法,研究研究研究文件(如研究论文)中编码的价值观。运用这一办法,我们分析在初级机器学习会议、ICML和NeurIPS上发表的100份高引用的机器学习论文。我们指出显示其价值的论文的主要特征:它们选择项目的理由,项目的性质是提升,对潜在负面后果的考虑,以及它们的体制联系和资金来源。我们发现,很少有文件可以说明其项目与社会需要(15 ⁇ )之间的联系,而较少讨论消极潜力(1 ⁇ )。我们通过逐线内容分析,我们找出在ML大学研究中提升的59种价值。我们发现,这些文件最经常根据业绩、一般化、定量证据、效率、建立机构关系、建立过去工作的机构联系和不断加强的技术性概念,我们一般地发现这些价值是当前和最接近的、最接近的理论和最明显的理论。