Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for identifying and understanding hierarchical structure in a data matrix constructed from a wide range of data types. Our numerical experiments on synthetic and real survey data demonstrate that PHNMF can recover latent hierarchical population structure in complex data with high accuracy. Moreover, the recovered subpopulation structure is meaningful and can be useful for improving downstream inference.
翻译:由于在包含多种复杂数据类型的现代调查数据中查明潜在的等级人口结构的问题,我们采用了基于人口的等级非负矩阵乘数(PHNMF),这是基于特征相似的等级非负矩阵乘数的一种变体,因此,它使得能够采用自动和可解释的方法,在根据广泛数据类型构建的数据矩阵中确定和理解等级结构。我们对合成和真实调查数据进行的数字实验表明,基于人口的等级非负矩阵乘数可以非常精确地在复杂数据中恢复潜在的等级人口结构。此外,回收的亚人口结构是有意义的,对于改进下游推论是有用的。