Network-linked data, where multivariate observations are interconnected by a network, are becoming increasingly prevalent in fields such as sociology and biology. These data often exhibit inherent noise and complex relational structures, complicating conventional modeling and statistical inference. Motivated by empirical challenges in analyzing such data sets, this paper introduces a family of network subspace generalized linear models designed for analyzing noisy, network-linked data. We propose a model inference method based on subspace-constrained maximum likelihood, which emphasizes flexibility in capturing network effects and provides a robust inference framework against network perturbations. We establish the asymptotic distributions of the estimators under network perturbations, demonstrating the method's accuracy through extensive simulations involving random network models and deep-learning-based embedding algorithms. The proposed methodology is applied to a comprehensive analysis of a large-scale study on school conflicts, where it identifies significant social effects, offering meaningful and interpretable insights into student behaviors.
翻译:暂无翻译