This paper aims to reduce randomness in football by analysing the role of lineups in final scores using machine learning prediction models we have developed. Football clubs invest millions of dollars on lineups and knowing how individual statistics translate to better outcomes can optimise investments. Moreover, sports betting is growing exponentially and being able to predict the future is profitable and desirable. We use machine learning models and historical player data from English Premier League (2020-2022) to predict scores and to understand how individual performance can improve the outcome of a match. We compared different prediction techniques to maximise the possibility of finding useful models. We created heuristic and machine learning models predicting football scores to compare different techniques. We used different sets of features and shown goalkeepers stats are more important than attackers stats to predict goals scored. We applied a broad evaluation process to assess the efficacy of the models in real world applications. We managed to predict correctly all relegated teams after forecast 100 consecutive matches. We show that Support Vector Regression outperformed other techniques predicting final scores and that lineups do not improve predictions. Finally, our model was profitable (42% return) when emulating a betting system using real world odds data.
翻译:本文旨在通过使用我们开发的机器学习预测模型分析球队在决赛中的角色来减少足球的随机性。 足球俱乐部在队列上投资数百万美元,并了解个人统计数据如何转化为更好的结果,可以优化投资。 此外,体育赌注正在成倍增长,而且能够预测未来是有利和可取的。 我们使用英国总理联盟(2020-2022年)的机器学习模型和历史玩家数据来预测分数,并了解个人业绩如何改善比赛结果。 我们比较了不同的预测技术,以尽量扩大寻找有用模型的可能性。 我们创建了预测足球得分的超常和机学习模型,以比较不同的技术。 我们使用不同的功能和显示的目标管理员统计比攻击者预测得分要重要得多。 我们应用了广泛的评估程序来评估模型在真实世界应用中的功效。 我们设法在预测连续100次匹配后正确预测了所有被降级的球队。 我们显示, 支持Vectr Regrestition超越了预测最后得分的其他技术, 而排队并没有改进预测。 最后,我们的模型在模拟世界数据模拟时是赢利( 42% 回报 ) 。