Modern software typically performs more than one functionality. These functionalities or features are not always organized in a way for modules representing these features to be used individually. Many software engineering approaches like programming language constructs, or product line visualization techniques have been proposed to organize projects as modules. Unfortunately, much legacy software suffer from years or decades of improper coding practices that leave the modules in the code almost undetectable. In such scenarios, a desirable requirement is to identify modules representing different features to be extracted. In this paper, we propose a novel approach that combines information retrieval and program analysis approaches to allow domain experts to identify slices of the program that represent modules using natural language search terms. We evaluate our approach by building a proof of concept tool in C, and extract modules from open source projects.
翻译:现代软件通常具有不止一种功能。这些功能或功能并非总能以代表这些特征的模块单独使用的方式加以组织。许多软件工程方法,如编程语言构造或产品线可视化技术,都提议将项目组织成模块。不幸的是,许多遗留软件都存在多年或几十年的不当编码做法,使模块在代码中几乎无法检测。在这种情况下,一个可取的要求是确定代表不同特征的模块。在本文中,我们提出一种新颖的办法,将信息检索和程序分析方法结合起来,让域专家使用自然语言搜索术语来识别代表模块的片段。我们通过在 C 中建立概念工具的证明来评估我们的方法,并从开放源项目中提取模块。