Football is a very result-driven industry, with goals being rarer than in most sports, so having further parameters to judge the performance of teams and individuals is key. Expected Goals (xG) allow further insight than just a scoreline. To tackle the need for further analysis in football, this paper uses machine learning applications that are developed and applied to Football Event data. From the concept, a Binary Classification problem is created whereby a probabilistic valuation is outputted using Logistic Regression and Gradient Boosting based approaches. The model successfully predicts xGs probability values for football players based on 15,575 shots. The proposed solution utilises StatsBomb as the data provider and an industry benchmark to tune the models in the right direction. The proposed ML solution for xG is further used to tackle the age-old cliche of: 'the ball has fallen to the wrong guy there'. The development of the model is used to adjust and gain more realistic values of expected goals than the general models show. To achieve this, this paper tackles Positional Adjusted xG, splitting the training data into Forward, Midfield, and Defence with the aim of providing insight into player qualities based on their positional sub-group. Positional Adjusted xG successfully predicts and proves that more attacking players are better at accumulating xG. The highest value belonged to Forwards followed by Midfielders and Defenders. Finally, this study has further developments into Player Adjusted xG with the aim of proving that Messi is statistically at a higher efficiency level than the average footballer. This is achieved by using Messi subset samples to quantify his qualities in comparison to the average xG models finding that Messi xG performs 347 xG higher than the general model outcome.
翻译:摘要:足球是一个非常注重结果的产业,在大多数体育比赛中得分更为罕见,因此拥有更多参数来评价球队和个人的表现至关重要。期望进球(xG)比单纯的比分获得了更深入的洞见。为了应对足球领域需要进一步分析的需求,本文使用机器学习工具开发和应用足球事件数据。从该概念中,创造一个二元分类问题,通过逻辑回归和梯度提升的方法输出一个概率评估值。该模型成功预测了基于15,575次射门的足球运动员的xG概率值。该提出的解决方案利用StatsBomb作为数据提供者,并根据行业基准调整模型方向。本文所提出的xG对位球调整(Positional Adjusted xG)被用于解决“错误的人在错误的位置”的老套路。该模型的发展主要是为了调整和获取比通常模型更现实的期望进球值。为了实现这一目标,本文将训练数据分成前卫线,中场和后卫,旨在为球员在其位置子组中的特质提供洞察力。对位球调整成功地预测并证明了更具攻击性的球员更擅长积累期望进球。期望进球对球员进行调整(Player Adjusted xG)则是最后的研究方向,旨在证明Messi在统计上比普通足球运动员的效率水平更高。这是通过使用Messi子集样本来量化他与平均xG模型的差别而实现的,发现Messi xG的值比一般模型的结果高了347个进球。