Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by studying three platforms in Venezuela and a BPO in Argentina. We lean on the Foucauldian notion of dispositif to define the data-production dispositif as an ensemble of discourses, actions, and objects strategically disposed to (re)produce power/knowledge relations in data and labor. Our dispositif analysis comprises the examination of 210 data work instruction documents, 55 interviews with data workers, managers, and requesters, and participant observation. Our findings show that discourses encoded in instructions reproduce and normalize the worldviews of requesters. Precarious working conditions and economic dependency alienate workers, making them obedient to instructions. Furthermore, discourses and social contexts materialize in artifacts, such as interfaces and performance metrics, limiting workers' agency and normalizing specific ways of interpreting data. We conclude by stressing the importance of counteracting the data-production dispositif by fighting alienation and precarization, and empowering data workers to become assets in the quest for high-quality data.
翻译:通常,各组织通过业务流程外包(BPO)公司和众包平台,外包与数据工作有关的流程(即生成和说明数据并评价产出),本文调查拉丁美洲外包的ML数据工作,研究了委内瑞拉的三个平台和阿根廷的一个BPO。我们依靠Foucauldian的处置概念,将数据生产处置作为讨论、行动和物体的组合,从战略上处理(重新)在数据和劳动方面建立权力/知识关系。我们的处置分析包括审查210份数据工作指导文件,55次与数据工作者、管理人员和请求者进行的访谈,以及与会者的观察。我们的调查结果显示,指令中的论述复制了提出请求者的世界观并使之正常化。我们依靠Foucauldidian的处置概念,将数据制作工作环境和经济依赖性疏导工人,使他们服从指示。此外,讨论和社会环境在工艺品中出现,例如接口和业绩衡量标准、限制工人的代理机构以及使数据标准化的具体方法对数据进行规范化。我们的结论是,通过打击数据的生成前数据质量,使数据变得反向数据升级的重要性。我们最后强调数据的重要性。