While constructing supervised learning models, we require labelled examples to build a corpus and train a machine learning model. However, most studies have built the labelled dataset manually, which in many occasions is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labelling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labelling each Java source file with their respective design patterns and summaries. Twenty five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of the annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labelled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.
翻译:在建立受监督的学习模式的同时,我们要求有贴标签的范例,以构建一个单元,并培训一个机器学习模式;然而,大多数研究都是人工建造有标签的数据集,这在很多情况下是一项艰巨的任务;为缓解这一问题,我们建立了一个名为CodeLabeller的在线工具;代码Labeller是一个基于网络的工具,目的是通过改进整个数据收集过程,为处理有监督的学习方法的标签源代码文档提供高效的处理方法;代码Labeller通过建立一个由大量开源Java项目收集的1,000多个源文件组成的文件来进行测试,并将每个爪哇源文件及其各自的设计模式和摘要贴上标签; 软件工程领域的25名专家参加了使用标准用户经验在线调查工具的可用性评价; 调查结果表明,该工具在电子和实用质量标准上达到了良好标准,很容易使用,并满足了受监督分类人员说明系统的需求; 除了协助研究人员采购标有标签的数据集外,该工具在软件工程教育中具有实际适用性,并协助建立软件工艺品专家评级。</s>