Voice-enabled technology is quickly becoming ubiquitous, and is constituted from machine learning (ML)-enabled components such as speech recognition and voice activity detection. However, these systems don't yet work well for everyone. They exhibit bias - the systematic and unfair discrimination against individuals or cohorts of individuals in favour of others (Friedman & Nissembaum, 1996) - across axes such as age, gender and accent. ML is reliant on large datasets for training. Dataset documentation is designed to give ML Practitioners (MLPs) a better understanding of a dataset's characteristics. However, there is a lack of empirical research on voice dataset documentation specifically. Additionally, while MLPs are frequent participants in fairness research, little work focuses on those who work with voice data. Our work makes an empirical contribution to this gap. Here, we combine two methods to form an exploratory study. First, we undertake 13 semi-structured interviews, exploring multiple perspectives of voice dataset documentation practice. Using open and axial coding methods, we explore MLPs' practices through the lenses of roles and tradeoffs. Drawing from this work, we then purposively sample voice dataset documents (VDDs) for 9 voice datasets. Our findings then triangulate these two methods, using the lenses of MLP roles and trade-offs. We find that current VDD practices are inchoate, inadequate and incommensurate. The characteristics of voice datasets are codified in fragmented, disjoint ways that often do not meet the needs of MLPs. Moreover, they cannot be readily compared, presenting a barrier to practitioners' bias reduction efforts. We then discuss the implications of these findings for bias practices in voice data and speech technologies. We conclude by setting out a program of future work to address these findings -- that is, how we may "right the docs".
翻译:Abstract:
语音技术的应用越来越普遍,其中包括了语音识别和语音活动检测等机器学习(ML)技术组件。然而,这些系统尚未完全适用于所有人。它们存在偏见,即在特定的领域或集群中歧视或歧视少数族裔,这在年龄、性别和口音等方面表现出来。ML 靠大规模数据集进行训练。数据集文档编写旨在让 ML 从业者更好地了解数据集的特性。然而,有关语音数据集文档记录实践的实证研究十分缺乏。此外,虽然 ML 从业者经常参与公平性研究,但很少有研究聚焦于处理语音数据的人群。本研究填补了这个空白,将两种方法结合起来进行探索性研究。首先,我们开展了13个半结构化访谈,探讨语音数据集文档记录实践的多个视角。使用开放式及轴向编码方法,通过从 MLP's 角色和折衷角度来探究 MLP's 的实践。然后,我们有针对性地对9个语音数据集的语音数据集文档 (VDD) 进行采样。根据 MLP 角色和权衡的视角,比较了这两种方法的发现。我们发现当前的 VDD 实践是不完整、不足和不可比较的。语音数据集的特征以片段化、分离的方式被编码,通常不能满足 MLP's 的需求。此外,它们无法被简单地比较,为从业者的降低偏见提出了障碍。然后,我们讨论了这些发现对语音数据和语音技术中的偏见实践的影响。我们总结了未来的工作计划以解决这些发现,即如何在这个领域 "纠正文档记录实践"。