Linear regression on a set of observations linked by a network has been an essential tool in modeling the relationship between response and covariates with additional network data. Despite its wide range of applications in many areas, such as in social sciences and health-related research, the problem has not been well-studied in statistics so far. Previous methods either lack inference tools or rely on restrictive assumptions on social effects and usually assume that networks are observed without errors, which is unrealistic in many problems. This paper proposes a linear regression model with nonparametric network effects. The model does not assume that the relational data or network structure is exactly observed; thus, the method can be provably robust to a certain network perturbation level. A set of asymptotic inference results is established under a general requirement of the network observational errors, and the robustness of this method is studied in the specific setting when the errors come from random network models. We discover a phase-transition phenomenon of the inference validity concerning the network density when no prior knowledge of the network model is available while also showing a significant improvement achieved by knowing the network model. As a by-product of this analysis, we derive a rate-optimal concentration bound for random subspace projection that may be of independent interest. Extensive simulation studies are conducted to verify these theoretical results and demonstrate the advantage of the proposed method over existing work in terms of accuracy and computational efficiency under different data-generating models. The method is then applied to adolescent network data to study the gender and racial difference in social activities.
翻译:由网络连接的一套观测数据在一系列观测中出现线性回归,这一直是模拟反应和网络结构之间的关系的基本工具。尽管这种方法在许多领域,例如社会科学和与健康有关的研究中的应用范围很广,但迄今在统计方面没有很好地研究这一问题。以前的方法要么缺乏推断工具,要么依赖关于社会影响的限制性假设,通常假设网络没有错误,这在许多问题上是不现实的。本文提出了一个具有非参数网络效应的线性回归模型。模型不假定关系数据或网络结构得到精确的观察;因此,这种方法可以对网络的某种渗透水平具有可辨别的强力。根据网络观测错误的一般要求,一套无症状的推断结果尚未确定,而且这种方法的稳健性在特定情况下,当错误来自随机网络模型时,我们发现一个具有非参数网络效果的网络密度的线性回归模型。在了解网络模型之前,我们通过了解网络的某种显著的准确性改进,同时通过了解网络的某种网络准确性网络的准确性水平。在网络观测模型中,通过随机性模型对模型进行模拟的理论性模拟分析,我们可以通过这些模型进行模拟分析,通过随机性的方法对当前利率进行模拟分析,从现有利率进行模拟分析,然后进行模拟分析,然后通过随机性分析,然后进行这种分析,然后进行关于网络的精确性推断性推算。通过随机性分析,我们通过这种分析,通过这种分析,通过这种模型分析,通过随机性分析,通过随机性的方法分析,可以进行这种分析,通过随机性分析,通过随机性的方法分析,对数据模拟分析,通过随机性模型进行这种分析,通过随机性的方法分析,进行这种分析,进行这种分析,然后进行这种分析,然后进行这种分析,然后进行这种分析,然后进行这种分析,然后进行这种分析,然后进行这种分析。