Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures. These lower dimensional objects provide useful insight, with interpretability favored by sparse structures. Sparsity, in addition, is beneficial in terms of regularization and, thus, to avoid over-fitting. By exploiting Bayesian shrinkage priors, we devise a computationally convenient approach for high-dimensional matrix factorization. The dependence between row and column entities is modeled by inducing flexible sparse patterns within factors. The availability of external information is accounted for in such a way that structures are allowed while not imposed. Inspired by boosting algorithms, we pair the the proposed approach with a numerical strategy relying on a sequential inclusion and estimation of low-rank contributions, with data-driven stopping rule. Practical advantages of the proposed approach are demonstrated by means of a simulation study and the analysis of soccer heatmaps obtained from new generation tracking data.
翻译:矩阵的因子化利用了这样一种想法:在复杂的高维数据中,实际信号通常存在于低维结构中。这些低维对象提供了有用的洞察力,其解释性得到稀疏结构的偏好。此外,从正规化的角度讲,平衡性也是有益的,从而避免了过分的适应。我们利用贝耶斯的缩微前科,为高维矩阵因子化设计了一种方便计算的方法。行和列实体之间的依赖性通过在各种因素中引入灵活分散的模式来建模。外部信息的可用性是以允许结构同时不强制实施的方式计算的。在推动算法的启发下,我们将拟议的方法与数字战略配对起来,依靠对低级贡献的顺序包容和估计,同时利用数据驱动的停止规则。拟议方法的实际优点是通过模拟研究和分析从新一代跟踪数据中获得的足球热谱法来证明。