Linear regression on a set of observations linked by a network has been an essential tool in modeling the relationship between response and covariates with additional network data. Despite its wide range of applications in many areas, such as in social sciences and health-related research, the problem has not been well-studied in statistics so far. Previous methods either lack inference tools or rely on restrictive assumptions on social effects and usually assume that networks are observed without errors, which is unrealistic in many problems. This paper proposes a linear regression model with nonparametric network effects. The model does not assume that the relational data or network structure is exactly observed; thus, the method can be provably robust to a certain network perturbation level. A set of asymptotic inference results is established under a general requirement of the network observational errors, and the robustness of this method is studied in the specific setting when the errors come from random network models. We discover a phase-transition phenomenon of the inference validity concerning the network density when no prior knowledge of the network model is available while also showing a significant improvement achieved by knowing the network model. As a by-product of this analysis, we derive a rate-optimal concentration bound for random subspace projection that may be of independent interest. Extensive simulation studies are conducted to verify these theoretical results and demonstrate the advantage of the proposed method over existing work in terms of accuracy and computational efficiency under different data-generating models. The method is then applied to middle school students' network data to study the effectiveness of educational workshops in reducing school conflicts.
翻译:由网络连接的一组观测数据在一系列观测中出现线性回归,这是模拟反应与网络数据共变之间的关系的基本工具。尽管在社会科学和与健康有关的研究等许多领域应用的方法范围广泛,但迄今在统计方面没有很好地研究这一问题。以前的方法要么缺乏推断工具,要么依赖关于社会影响的限制性假设,通常假设网络没有错误,这在许多问题上是不现实的。本文建议了一个具有非参数网络效应的中线性回归模型。模型不假定关系数据或网络结构得到精确的观察;因此,这种方法可以对网络的某种渗透程度具有可辨别的强力。根据网络观测错误的一般要求,一套无症状的推断结果尚未确定,而且这种方法的稳健性在特定环境中研究错误出自随机网络模型。我们发现,当以前没有网络模型知识时,网络密度的推断性是分阶段过渡性现象。同时,通过了解网络模型的准确性,该方法可以大大改进某些网络的渗透性;根据网络观测错误的一般要求,一套无症状的推断结果,这种方法在具体环境下研究中研究。通过模拟模型进行这种模拟分析,我们通过随机的模拟分析,可以推断现有利率分析,进行这种模拟研究,从模拟的模拟研究,从模拟研究到模拟研究,从模拟中进行这种模拟研究,从现有利率分析,从模拟研究,从模拟研究,从模拟中进行这种模拟的模拟研究,从模拟中可以推断性研究。