Billions of distributed, heterogeneous and resource constrained IoT devices deploy on-device machine learning (ML) for private, fast and offline inference on personal data. On-device ML is highly context dependent, and sensitive to user, usage, hardware and environment attributes. This sensitivity and the propensity towards bias in ML makes it important to study bias in on-device settings. Our study is one of the first investigations of bias in this emerging domain, and lays important foundations for building fairer on-device ML. We apply a software engineering lens, investigating the propagation of bias through design choices in on-device ML workflows. We first identify reliability bias as a source of unfairness and propose a measure to quantify it. We then conduct empirical experiments for a keyword spotting task to show how complex and interacting technical design choices amplify and propagate reliability bias. Our results validate that design choices made during model training, like the sample rate and input feature type, and choices made to optimize models, like light-weight architectures, the pruning learning rate and pruning sparsity, can result in disparate predictive performance across male and female groups. Based on our findings we suggest low effort strategies for engineers to mitigate bias in on-device ML.
翻译:摘要:数十亿个分布式、异构和资源受限的物联网设备部署设备上的机器学习,以便在个人数据上进行私有、快速和脱机推断。设备上的机器学习高度依赖于上下文,对用户、使用、硬件和环境属性敏感。这种敏感性和机器学习的偏倚倾向,使研究设备上的偏差变得重要。我们的研究是对这一新兴领域中偏差的首次调查之一,为构建更公平的设备上机器学习奠定了重要基础。我们采用软件工程视角,研究设计选择在设备上机器学习工作流中传播偏差。我们首先将可靠性偏差作为不公平的来源,提出一种衡量方式来量化它。然后,我们针对关键词识别任务进行了实证实验,展示了复杂和相互作用的技术设计选择如何放大和传播可靠性偏差。我们的结果验证了模型训练期间做出的设计选择,如采样率和输入特征类型,以及优化模型所做的选择,如轻量级架构、修剪学习率和修剪稀疏度,可能导致男性和女性群体之间存在不同的预测性能。基于我们的发现,我们建议工程师采取低成本策略来减轻设备上机器学习中的偏差。