We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team's composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.
翻译:我们根据2014-15年8个欧洲主要联盟赛季8个欧洲主要联盟赛季的混合类型变量分析足球(足球)球员性能数据。我们根据量身定做的不同计量标准对这些数据进行分组。为了在众多现有组群方法之间作出决定并选择适当数量的组群,我们采用了Akhanli和Hennig(202020年)的方法。这基于若干验证标准,其中提到一个组群的不同可取特征。这些特征是根据集群的目的选择的,从而可以将适当的验证指数界定为衡量理想特征的校准个人指数的加权平均数。我们得出了两个不同的组群。第一个组群将数据集分成主要组群,主要组群组成基本不同的球员,可用于分析一个球队的组成。第二个组将数据集分为许多小组群(平均有10个球员),用于查找与某个球员相近的球员。这些组群群群群群的特征将进行深入讨论。我们通过对足球专家调查了解第二组群集的标准。