User reviews of mobile apps provide a communication channel for developers to perceive user satisfaction. Many app features that users have problems with are usually expressed by key phrases such as "upload pictures", which could be buried in the review texts. The lack of fine-grained view about problematic features could obscure the developers' understanding of where the app is frustrating users, and postpone the improvement of the apps. Existing pattern-based approaches to extract target phrases suffer from low accuracy due to insufficient semantic understanding of the reviews, thus can only summarize the high-level topics/aspects of the reviews. This paper proposes a semantic-aware, fine-grained app review analysis approach (SIRA) to extract, cluster, and visualize the problematic features of apps. The main component of SIRA is a novel BERT+Attr-CRF model for fine-grained problematic feature extraction, which combines textual descriptions and review attributes to better model the semantics of reviews and boost the performance of the traditional BERT-CRF model. SIRA also clusters the extracted phrases based on their semantic relations and presents a visualization of the summaries. Our evaluation on 3,426 reviews from six apps confirms the effectiveness of SIRA in problematic feature extraction and clustering. We further conduct an empirical study with SIRA on 318,534 reviews of 18 popular apps to explore its potential application and examine its usefulness in real-world practice.
翻译:移动应用程序的用户审查为开发商提供了一个沟通渠道,让开发商能够理解用户的满意度。许多应用特征,用户遇到的问题通常以关键词表达,如“上装图片”,这些关键词可以隐藏在审查文本中。对于问题特点缺乏细微的视角,可能模糊开发商对应用程序令人沮丧用户之处的理解,推迟软件的改进。现有基于模式的提取目标短语的方法由于对审查的语义理解不足而具有低准确性,因此只能总结审查的高层次议题/目标。本文件还提议采用精细化的应用程序审查分析方法(SIRA)提取、分组和直观分析应用程序的问题特征。SIRA的主要组成部分是新颖的BERT+Atr-CRF模型,用于精细化问题特征提取,该模型将文字描述和审查属性结合起来,以更好地模拟审查的语义,提高传统BERT-CRF模型的性能。SIRA模型还把基于其语义关系的精细精细的语义、精细化应用的语句组合组合组合组合组合组合,并在SIRA 4号研究中进一步审视SIRA 和SIBIBIBA 的精准分析中,我们关于其格式分析的精准性研究,对SIBI-A-A-A-A-A-A-A-I-I-A-A-A-A-I-I-I-I-I-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-I-A-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-