Datasets sourced from people with disabilities and older adults play an important role in innovation, benchmarking, and mitigating bias for both assistive and inclusive AI-infused applications. However, they are scarce. We conduct a systematic review of 137 accessibility datasets manually located across different disciplines over the last 35 years. Our analysis highlights how researchers navigate tensions between benefits and risks in data collection and sharing. We uncover patterns in data collection purpose, terminology, sample size, data types, and data sharing practices across communities of focus. We conclude by critically reflecting on challenges and opportunities related to locating and sharing accessibility datasets calling for technical, legal, and institutional privacy frameworks that are more attuned to concerns from these communities.
翻译:来自残疾人和老年人的数据集在创新、基准制定和减少对辅助性和包容性AI-Fedive应用的偏向性方面发挥着重要作用,然而,这些数据集很少。我们系统地审查过去35年来在不同学科人工发现的137个无障碍数据集。我们的分析突出了研究人员如何在数据收集和共享方面处理利益和风险之间的紧张关系。我们发现各重点社区在数据收集目的、术语、抽样规模、数据类型和数据共享做法方面的模式。我们最后批判地思考了在查找和共享无障碍数据集方面的挑战和机遇,这些数据集要求建立技术、法律和机构隐私框架,更能适应这些社区的关切。