[Context and motivation.] Extracting features from mobile app reviews is increasingly important for multiple requirements engineering (RE) tasks. However, existing methods struggle to turn noisy, ambiguous feedback into interpretable insights. [Question/problem.] Syntactic approaches lack semantic depth, while large language models (LLMs) often miss fine-grained features or fail to structure them coherently. In addition, existing methods output flat lists of features without semantic organization, limiting interpretation and comparability. Consequently, current feature extraction approaches do not provide structured, meaningful representations of app features. As a result, practitioners face fragmented information that hinder requirement analysis, prioritization, and cross-app comparison, among other use cases. [Principal ideas/results.] In this context, we propose FeClustRE, a framework integrating hybrid feature extraction, hierarchical clustering with auto-tuning and LLM-based semantic labelling. FeClustRE combines syntactic parsing with LLM enrichment, organizes features into clusters, and automatically generates meaningful taxonomy labels. We evaluate FeClustRE on public benchmarks for extraction correctness and on a sample study of generative AI assistant app reviews for clustering quality, semantic coherence, and interpretability. [Contribution.] Overall, FeClustRE delivers (1) a hybrid framework for feature extraction and taxonomy generation, (2) an auto-tuning mechanism with a comprehensive evaluation methodology, and (3) open-source and replicable implementation. These contributions bridge user feedback and feature understanding, enabling deeper insights into current and emerging requirements.
翻译:[背景与动机] 从移动应用评论中提取功能特征对于多项需求工程任务日益重要。然而,现有方法难以将杂乱、模糊的反馈转化为可解释的见解。[问题] 基于句法的方法缺乏语义深度,而大语言模型常常遗漏细粒度特征或无法将其连贯地组织起来。此外,现有方法输出的是缺乏语义组织的扁平化特征列表,限制了可解释性与可比性。因此,当前的特征提取方法无法提供结构化、有意义的应用功能表征。这导致从业者面临信息碎片化的问题,阻碍了需求分析、优先级排序以及跨应用比较等用例。[核心思想与成果] 为此,我们提出FeClustRE框架,该框架集成了混合特征提取、带自动调参的层次化聚类以及基于大语言模型的语义标注。FeClustRE结合句法解析与大语言模型增强,将特征组织成簇,并自动生成有意义的分类标签。我们在公开基准上评估了提取准确性,并通过对生成式AI助手应用评论的案例研究评估了聚类质量、语义连贯性与可解释性。[贡献] 总体而言,FeClustRE提供了:(1) 用于特征提取与分类体系生成的混合框架,(2) 包含综合评估方法的自动调参机制,以及(3) 开源且可复现的实现。这些贡献搭建了用户反馈与功能理解之间的桥梁,为洞察当前及新兴需求提供了更深入的途径。