Machine Learning seeks to identify and encode bodies of knowledge within provided datasets. However, data encodes subjective content, which determines the possible outcomes of the models trained on it. Because such subjectivity enables marginalisation of parts of society, it is termed (social) `bias' and sought to be removed. In this paper, we contextualise this discourse of bias in the ML community against the subjective choices in the development process. Through a consideration of how choices in data and model development construct subjectivity, or biases that are represented in a model, we argue that addressing and mitigating biases is near-impossible. This is because both data and ML models are objects for which meaning is made in each step of the development pipeline, from data selection over annotation to model training and analysis. Accordingly, we find the prevalent discourse of bias limiting in its ability to address social marginalisation. We recommend to be conscientious of this, and to accept that de-biasing methods only correct for a fraction of biases.
翻译:然而,数据编码主观内容,决定了经过培训的模型的可能结果。由于这种主观性使社会的某些部分处于边缘地位,因此这种主观性被称为(社会)`偏见',并试图予以删除。在本文中,我们联系了ML社区中这种偏见的论述,反对发展进程中的主观选择。通过考虑在数据和模型发展中的选择如何形成主观性或模式发展中的偏见,我们认为,处理和减轻偏见是几乎不可能的。这是因为,数据和ML模式都是发展管道的每一步,从数据选择到示范培训和分析,都具有意义的对象。因此,我们发现普遍存在的偏见的论述限制了其处理社会边缘化的能力。我们建议认真对待这一点,并接受这种消除偏见的方法只能纠正部分偏见。