Three state-of-the-art statistical ranking methods for forecasting football matches are combined with several other predictors in a hybrid machine learning model. Namely an ability estimate for every team based on historic matches; an ability estimate for every team based on bookmaker consensus; average plus-minus player ratings based on their individual performances in their home clubs and national teams; and further team covariates (e.g., market value, team structure) and country-specific socio-economic factors (population, GDP). The proposed combined approach is used for learning the number of goals scored in the matches from the four previous UEFA EUROs 2004-2016 and then applied to current information to forecast the upcoming UEFA EURO 2020. Based on the resulting estimates, the tournament is simulated repeatedly and winning probabilities are obtained for all teams. A random forest model favors the current World Champion France with a winning probability of 14.8% before England (13.5%) and Spain (12.3%). Additionally, we provide survival probabilities for all teams and at all tournament stages.
翻译:在混合机器学习模式中,预测足球比赛的三种最先进的统计排名方法与若干其他预测器相结合。即:根据历史匹配对每个团队的能力估计;根据书商共识对每个团队的能力估计;根据各自在家庭俱乐部和国家团队中的表现,对平均加减球员的评分;以及进一步的团队共变(如市场价值、团队结构)和具体国家的社会经济因素(人口、GDP),拟议的组合方法用于学习前四个欧洲足联2004-2016年欧洲足联比赛中得分的目标数,然后用于当前信息,预测即将到来的欧洲足联2020年欧洲足联。根据所得出的估计,反复模拟比赛,并获得所有团队的概率。随机森林模型有利于目前的世界冠军法国,赢得的概率在英格兰(13.5%)和西班牙(12.3%)之前为14.8%。此外,我们为所有团队和在所有锦标赛阶段提供生存概率。