We provide a window into the process of constructing a dataset for machine learning (ML) applications by reflecting on the process of building World Wide Dishes (WWD), an image and text dataset consisting of culinary dishes and their associated customs from around the world. WWD takes a participatory approach to dataset creation: community members guide the design of the research process and engage in crowdsourcing efforts to build the dataset. WWD responds to calls in ML to address the limitations of web-scraped Internet datasets with curated, high-quality data incorporating localised expertise and knowledge. Our approach supports decentralised contributions from communities that have not historically contributed to datasets as a result of a variety of systemic factors. We contribute empirical evidence of the invisible labour of participatory design work by analysing reflections from the research team behind WWD. In doing so, we extend computer-supported cooperative work (CSCW) literature that examines the post-hoc impacts of datasets when deployed in ML applications by providing a window into the dataset construction process. We surface four dimensions of invisible labour in participatory dataset construction: building trust with community members, making participation accessible, supporting data production, and understanding the relationship between data and culture. This paper builds upon the rich participatory design literature within CSCW to guide how future efforts to apply participatory design to dataset construction can be designed in a way that attends to the dynamic, collaborative, and fundamentally human processes of dataset creation.
翻译:暂无翻译