Code style is an aesthetic choice exhibited in source code that reflects programmers individual coding habits. This study is the first to investigate whether code style can be used as an indicator to identify good programmers. Data from Google Code Jam was chosen for conducting the study. A cluster analysis was performed to find whether a particular coding style could be associated with good programmers. Furthermore, supervised machine learning models were trained using stylistic features and evaluated using recall, macro-F1, AUC-ROC and balanced accuracy to predict good programmers. The results demonstrate that good programmers may be identified using supervised machine learning models, despite that no particular style groups could be attributed as a good style.
翻译:代码样式是一种反映程序员个人编码习惯的源代码中显示的审美选择。本研究是第一次调查代码样式是否可以用作确定良好程序员的指标。为进行研究,选择了Google代码 Jam的数据。进行了群集分析,以确定特定编码样式是否与良好的程序员相关联。此外,还利用文体特征对受监督的机器学习模型进行了培训,并使用回调、宏-F1、AUC-ROC和平衡准确性来预测良好的程序员进行了评估。研究结果表明,可以利用受监督的机器学习模型来识别良好的程序员,尽管没有特定的风格组被归为一种好风格。