Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others. %Despite their widespread usage, the processing of particularly sensitive personal data, and their proximity to domains known to be susceptible to bias, such as healthcare, bias in PI has not been investigated systematically. Despite their widespread usage, the processing of sensitive PI data may suffer from biases, which may entail practical and ethical implications. In this work, we present the first comprehensive empirical and analytical study of bias in PI systems, including biases in raw data and in the entire machine learning life cycle. We use the most detailed framework to date for exploring the different sources of bias and find that biases exist both in the data generation and the model learning and implementation streams. According to our results, the most affected minority groups are users with health issues, such as diabetes, joint issues, and hypertension, and female users, whose data biases are propagated or even amplified by learning models, while intersectional biases can also be observed.
翻译:个人信息学中的偏见揭示
由智能手机和可穿戴设备驱动的个人信息学系统使人们通过提供有意义和可操作的见解,打破与健康信息之间的障碍,并使人们更健康地生活。如今,数十亿用户使用这种系统不仅监测身体活动和睡眠,还监测生命体征以及女性和心脏健康等其他因素。尽管它们被广泛使用,但特别敏感的个人信息数据处理以及其接近已知易受偏见的领域,如卫生保健,使得PI中的偏见还未得到系统研究。在本文中,我们展示了对PI系统中的偏见进行的第一次全面的经验性和分析性研究,包括原始数据和整个机器学习生命周期中的偏见。我们使用最详细的框架来探索偏见的不同来源,并发现偏见存在于数据生成和模型学习和实施流程中。根据我们的结果,最受影响的少数群体是患有健康问题(如糖尿病、关节问题和高血压)的用户,以及女性用户,其数据偏见被学习模型传播甚至放大,同时也可以观察到交叉偏见。