In machine learning tasks, especially in the tasks of prediction, scientists tend to rely solely on available historical data and disregard unproven insights, such as experts' opinions, polls, and betting odds. In this paper, we propose a general three-step framework for utilizing experts' insights in machine learning tasks and build four concrete models for a sports game prediction case study. For the case study, we have chosen the task of predicting NCAA Men's Basketball games, which has been the focus of a group of Kaggle competitions in recent years. Results highly suggest that the good performance and high scores of the past models are a result of chance, and not because of a good-performing and stable model. Furthermore, our proposed models can achieve more steady results with lower log loss average (best at 0.489) compared to the top solutions of the 2019 competition (>0.503), and reach the top 1%, 10% and 1% in the 2017, 2018 and 2019 leaderboards, respectively.
翻译:在机器学习任务中,特别是在预测任务中,科学家往往完全依赖现有的历史数据,忽视未经证实的洞察力,例如专家的意见、民意测验和赌注率。在本文件中,我们提出了一个在机器学习任务中利用专家的洞察力的三步框架,并为体育比赛预测案例研究建立四个具体模型。关于案例研究,我们选择了预测国家空间活动局男子篮球比赛的任务,这是近年来一系列卡格勒比赛的重点。结果高度表明,过去模型的良好表现和高分数是机会的结果,而不是一个良好和稳定的模型。此外,我们拟议的模型可以以较低的日志损失平均值(与2019年竞赛的顶级解决方案( > 0503)相比,分别达到2017年、2018年和2019年头板的前1%、10%和1%,从而取得更稳定的结果。