Integrated principal components analysis, or iPCA, is an unsupervised learning technique for grouped vector data recently defined by Tang and Allen. Like PCA, iPCA computes new axes that best explain the variance of the data, but iPCA is designed to handle corrupting influences by the elements within each group on one another - e.g. data about students at a school grouped into classrooms. Tang and Allen showed empirically that regularized iPCA finds useful features for such grouped data in practice. However, it is not yet known when unregularized iPCA generically exists. For contrast, PCA (which is a special case of iPCA) typically exists whenever the number of data points exceeds the dimension. We study this question and find that the answer is significantly more complicated than it is for PCA. Despite this complexity, we find simple sufficient conditions for a very useful case - when the groups are no more than half as large as the dimension and the total number of data points exceeds the dimension, iPCA generically exists. We also fully characterize the existence of iPCA in case all the groups are the same size. When all groups are not the same size, however, we find that the group sizes for which iPCA generically exists are the integral points in a non-convex union of polyhedral cones. Nonetheless, we exhibit an algorithm to decide whether iPCA generically exists that is polynomial time in the node dimensions (based on the affirmative answer for the saturation conjecture by Knutson and Tao as well as a very simple randomized algorithm.At its core, our approach identifies connections between iPCA and stability notions for star quivers, thus bringing tools from invariant theory and quiver representations to the table.
翻译:暂无翻译