社会技术机器学习系统的多样性 (Diversity in Sociotechnical Machine Learning Systems)

There has been a surge of recent interest in sociocultural diversity in machine learning (ML) research, with researchers (i) examining the benefits of diversity as an organizational solution for alleviating problems with algorithmic bias, and (ii) proposing measures and methods for implementing diversity as a design desideratum in the construction of predictive algorithms. Currently, however, there is a gap between discussions of measures and benefits of diversity in ML, on the one hand, and the broader research on the underlying concepts of diversity and the precise mechanisms of its functional benefits, on the other. This gap is problematic because diversity is not a monolithic concept. Rather, different concepts of diversity are based on distinct rationales that should inform how we measure diversity in a given context. Similarly, the lack of specificity about the precise mechanisms underpinning diversity's potential benefits can result in uninformative generalities, invalid experimental designs, and illicit interpretations of findings. In this work, we draw on research in philosophy, psychology, and social and organizational sciences to make three contributions: First, we introduce a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct epistemic and political rationales underlying these concepts. Second, we provide an overview of mechanisms by which diversity can benefit group performance. Third, we situate these taxonomies--of concepts and mechanisms--in the lifecycle of sociotechnical ML systems and make a case for their usefulness in fair and accountable ML. We do so by illustrating how they clarify the discourse around diversity in the context of ML systems, promote the formulation of more precise research questions about diversity's impact, and provide conceptual tools to further advance research and practice.

翻译：最近,人们对机器学习(ML)研究的社会文化多样性表现出了浓厚的兴趣,研究人员(一) 研究多样性的好处,将其作为缓解算法偏差问题的组织解决办法,(二) 提出实施多样性的措施和方法,作为构建预测算法时的一种设计分流;然而,目前,一方面,对多样性的措施和惠益的讨论与对多样性的基本概念及其功能效益的广泛研究之间存在差距,另一方面,对多样性概念及其功能效益的精确机制的研究也存在差距。这种差距存在问题,因为多样性并不是一个单一的概念。相反,不同的多样性概念是基于不同的理由,应该用来说明我们如何在特定背景下衡量多样性。同样,缺乏关于多样性作为构建预测算法的潜在益处的精确机制的精确机制的精确性,可能会导致不具有说服力的一般性、无效的实验设计以及对研究结果的非法解释。在这项工作中,我们借助哲学、心理学和社会及组织科学的三大贡献:首先,我们引入了从科学哲学出发的不同多样性概念出发的预先分类,并解释不同的理论背景。相反,多样化概念的不同概念基于不同概念的不同概念的不同理论的理论背景和政治理论背景,从而进一步解释这些业绩概念的准确性概念,我们通过税化研究机制为这些研究提供了一种精确的理论和实地概念的理论解释。