使用工艺采矿和N-Gram语言模型分析软件开发者 (Profiling Software Developers with Process Mining and N-Gram Language Models)

Context: Profiling developers is challenging since many factors, such as their skills, experience, development environment and behaviors, may influence a detailed analysis and the delivery of coherent interpretations. Objective: We aim at profiling software developers by mining their software development process. To do so, we performed a controlled experiment where, in the realm of a Python programming contest, a group of developers had the same well-defined set of requirements specifications and a well-defined sprint schedule. Events were collected from the PyCharm IDE, and from the Mooshak automatic jury where subjects checked-in their code. Method: We used n-gram language models and text mining to characterize developers' profiles, and process mining algorithms to discover their overall workflows and extract the correspondent metrics for further evaluation. Results: Findings show that we can clearly characterize with a coherent rationale most developers, and distinguish the top performers from the ones with more challenging behaviors. This approach may lead ultimately to the creation of a catalog of software development process smells. Conclusions: The profile of a developer provides a software project manager a clue for the selection of appropriate tasks he/she should be assigned. With the increasing usage of low and no-code platforms, where coding is automatically generated from an upper abstraction layer, mining developer's actions in the development platforms is a promising approach to early detect not only behaviors but also assess project complexity and model effort.

翻译：环境:分析开发者具有挑战性,因为许多因素,如他们的技能、经验、发展环境和行为,可能影响详细的分析和提供一致的解释。目标:我们的目标是通过挖掘软件开发者开发过程,对软件开发者进行剖析。为了做到这一点,我们进行了受控实验,在Python编程竞赛领域,一组开发者拥有一套定义明确的要求规格和定义明确的印刷时间表。从PyCharm IDE和Mohoshak自动陪审团收集了事件,其中主体在代码中进行了检查。方法:我们使用正方言语言模型和文字挖掘来描述开发者的特点,并使用开采算法来发现其总体工作流程,并提取用于进一步评估的代理标准。结果:结果显示,我们可以以一致的理由清楚地描述大多数开发者,并将高级表演者与行为更具有挑战性的行为表区分开来。这种方法最终可能导致软件开发过程的模型目录的形成。结论:开发者的概况为选择他/她的适当任务提供了线索,但文字挖掘过程的精细,但不应从模型中自动地标定出一个高层次的模型。