Mobile software apps ("apps") are one of the prevailing digital technologies that our modern life heavily depends on. A key issue in the development of apps is how to design gender-inclusive apps. Apps that do not consider gender inclusion, diversity, and equality in their design can create barriers (e.g., excluding some of the users because of their gender) for their diverse users. While there have been some efforts to develop gender-inclusive apps, a lack of deep understanding regarding user perspectives on gender may prevent app developers and owners from identifying issues related to gender and proposing solutions for improvement. Users express many different opinions about apps in their reviews, from sharing their experiences, and reporting bugs, to requesting new features. In this study, we aim at unpacking gender discussions about apps from the user perspective by analysing app reviews. We first develop and evaluate several Machine Learning (ML) and Deep Learning (DL) classifiers that automatically detect gender reviews (i.e., reviews that contain discussions about gender). We apply our ML and DL classifiers on a manually constructed dataset of 1,440 app reviews from the Google App Store, composing 620 gender reviews and 820 non-gender reviews. Our best classifier achieves an F1-score of 90.77%. Second, our qualitative analysis of a randomly selected 388 out of 620 gender reviews shows that gender discussions in app reviews revolve around six topics: App Features, Appearance, Content, Company Policy and Censorship, Advertisement, and Community. Finally, we provide some practical implications and recommendations for developing gender-inclusive apps.
翻译:移动软件应用(“应用”)是我们现代生活中极为重要的数字技术之一。应用开发中的一个关键问题是如何设计性别包容的应用。没有考虑性别包容、多样性和平等性设计的应用可能会为其各种用户创建障碍(例如,由于性别而排除某些用户)。虽然有一些努力开发性别包容的应用,但缺乏关于用户对性别的深入理解可能阻止应用开发人员和所有者识别与性别相关的问题,并提出改进解决方案。用户在其评论中表达了许多不同的对应用的意见,从分享他们的经验和报告漏洞到请求新功能。在本研究中,我们旨在从用户的角度分析应用评论中与性别有关的讨论,通过分析应用评论,首先开发和评估了几个机器学习(ML)和深度学习(DL)分类器,可以自动检测性别评论(即包含关于性别讨论的评论)。我们在手动构建的由Google应用商店中的1,440个应用评论组成的数据集上应用了我们的ML和DL分类器,其中包括620个性别评论和820个非性别评论。我们最好的分类器达到了90.77%的F1分数。其次,我们对随机选择的620个性别评论中的388个进行了定性分析,发现应用评论中的性别讨论围绕着六个主题展开:应用功能、外观、内容、公司政策和审查、广告和社区。最后,我们提供了一些有关开发性别包容应用的实际影响和建议。