Methods based on machine learning become increasingly popular in many areas as they allow models to be fitted in a highly-data driven fashion, and often show comparable or even increased performance in comparison to classical methods. However, in the area of educational sciences the application of machine learning is still quite uncommon. This work investigates the benefit of using classification trees for analyzing data from educational sciences. An application to data on school transition rates in Austria indicates different aspects of interest in the context of educational sciences: (i) the trees select variables for predicting school transition rates in a data-driven fashion which are well in accordance with existing confirmatory theories from educational sciences, (ii) trees can be employed for performing variable selection for regression models, (iii) the classification performance of trees is comparable to that of binary regression models. These results indicate that trees and possibly other machine learning methods may also be helpful to explore high-dimensional educational data sets, especially where no confirmatory theories have been developed yet.
翻译:在许多领域,基于机器学习的方法越来越受欢迎,因为这些方法使得模型能够以高度数据驱动的方式安装,而且往往显示与古典方法相比的可比较性或甚至提高性能。然而,在教育科学领域,机器学习的应用仍然非常罕见。这项工作调查了使用分类树分析教育科学数据的好处。奥地利学校过渡率数据的应用表明,教育科学领域对不同方面的兴趣:(一) 树木选择变量,以数据驱动的方式预测学校过渡率,这种方式与教育科学的现有确认理论完全一致;(二) 树木可用于为回归模型进行可变选择;(三) 树木分类性能与二元回归模型的相似。这些结果表明,树木和可能的其他机器学习方法也有助于探索高维教育数据集,特别是在尚未形成确认理论的情况下。