网络依赖下的高度后勤递减 (High Dimensional Logistic Regression Under Network Dependence)

Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure. This necessitates the development of models that can simultaneously handle both the network peer-effect (arising from neighborhood interactions) and the effect of high-dimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We propose a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates, which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Consequently, our method is computationally efficient and, under various standard regularity conditions, our estimate attains the classical high-dimensional rate of consistency. In particular, our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical logistic regression, when the true parameter is sparse and the underlying network is not too dense. As a consequence of the general results, we derive the rates of consistency for various natural network models. We also develop an efficient algorithm for computing the estimates and validate our theoretical results in numerical experiments.

翻译：物流回归是模拟基于共变组合的二进制结果概率的最根本方法之一。然而,典型的后勤回归模式依赖独立抽样假设,而独立抽样假设往往在结果通过基本网络结构相互作用时被违反。这就要求开发模型,既能同时处理网络同级效应(由邻里互动产生),又能同时处理高维共变效应的影响。在本文件中,我们开发了一个框架,将这种依赖性纳入高维化后勤回归模型,采用四进制互动术语,如Ising模型,旨在从基础网络中获取对齐互动。因此,由此形成的自然实验模型也可以被视为一个Ising模型,在这个模型中,从线性依赖的外部字段线性地将高度共变异体编码。我们提出了一种受罚的最大假相比方法,用以评估网络同度效应和高维值的影响。我们除了处理参数的高度维度外,还方便地避免计算最大可能性方法的偏差。因此,我们的方法在模型中,在精确度的深度模型中,在各种标准的精确率下,也意味着我们最常态的精确的计算结果。在各种标准的网络的精确度评估中,我们作为稳定的网络的精确率中,我们的精确率的计算,在不同的精确率下,也意味着我们的精确率的精确率的计算结果。