Given a set of sequences comprised of time-ordered events, sequential pattern mining is useful to identify frequent subsequences from different sequences or within the same sequence. However, in sport, these techniques cannot determine the importance of particular patterns of play to good or bad outcomes, which is often of greater interest to coaches and performance analysts. In this study, we apply a recently proposed supervised sequential pattern mining algorithm called safe pattern pruning (SPP) to 490 labelled event sequences representing passages of play from one rugby team's matches from the 2018 Japan Top League. We compare the SPP-obtained patterns that are the most discriminative between scoring and non-scoring outcomes from both the team's and opposition teams' perspectives, with the most frequent patterns obtained with well-known unsupervised sequential pattern mining algorithms when applied to subsets of the original dataset, split on the label. Our obtained results found that linebreaks, successful lineouts, regained kicks in play, repeated phase-breakdown play, and failed exit plays by the opposition team were identified as as the patterns that discriminated most between the team scoring and not scoring. Opposition team linebreaks, errors made by the team, opposition team lineouts, and repeated phase-breakdown play by the opposition team were identified as the patterns that discriminated most between the opposition team scoring and not scoring. It was also found that, by virtue of its supervised nature as well as its pruning and safe-screening properties, SPP obtained a greater variety of generally more sophisticated patterns than the unsupervised models, which are likely to be of more utility to coaches and performance analysts.
翻译:鉴于一系列由时间顺序排列的事件组成的一系列序列,序列型式采矿有助于确定不同序列或同一序列中经常出现的子序列。但是,在体育领域,这些技术无法确定特定游戏模式对好或坏结果的重要性,教练和业绩分析家对此往往更感兴趣。在本研究中,我们采用了最近提出的由监督监督的序列式采矿算法,称为安全模式剪裁(SPP)到490个标定事件序列,这些序列代表了2018年日本顶级联盟的一个橄榄球队比赛的比赛中的游戏通道。我们比较了在评分和对立小组和对立小组观点中最有区别的SPPP所具备的模式。在最初数据集的子集应用时,最常为人熟知的未经监督的顺序式采矿算法被应用到490个标定的事件序列。我们得到的结果表明,线断、成功的排定、在游戏中重新获得的踢脚步、重复的级决赛次、以及失败的退出队的退出赛事被确认为最难的、在团队的排名和对立方之间最有偏差的模式,通常由团队的评分的评分和分。团队发现,团队的评分和倒队和倒的评分的评分队的评分为,其评分比比的分和倒的评分比的评分的评分和分的评分的评分比的得更重的评分的分比的分比的分比的分的分和分比的得更多的分比得更多。