We consider a high-dimensional monotone single index model (hdSIM), which is a semiparametric extension of a high-dimensional generalize linear model (hdGLM), where the link function is unknown, but constrained with monotone and non-decreasing shape. We develop a scalable projection-based iterative approach, the "Sparse Orthogonal Descent Single-Index Model" (SOD-SIM), which alternates between sparse-thresholded orthogonalized "gradient-like" steps and isotonic regression steps to recover the coefficient vector. Our main contribution is that we provide finite sample estimation bounds for both the coefficient vector and the link function in high-dimensional settings under very mild assumptions on the design matrix $\mathbf{X}$, the error term $\epsilon$, and their dependence. The convergence rate for the link function matched the low-dimensional isotonic regression minimax rate up to some poly-log terms ($n^{-1/3}$). The convergence rate for the coefficients is also $n^{-1/3}$ up to some poly-log terms. This method can be applied to many real data problems, including GLMs with misspecified link, classification with mislabeled data, and classification with positive-unlabeled (PU) data. We study the performance of this method via both numerical studies and also an application on a rocker protein sequence data.
翻译:我们考虑的是高维单质单一指数模型(hdSIM),它是高维通用线性模型(hdGLM)的半参数扩展,其链接函数未知,但受单质和非降序形状的限制。我们开发了一种基于可缩放的投影迭代方法,即“Orthopole Slent-Index 模型”(SOD-SIM),该模型在稀薄-高度固定的或多位化的“渐进式相似”步骤和等离子回归步骤之间交替,以恢复系数矢量。我们的主要贡献是,在设计矩阵 $\ mathb{X} 的非常温和假设下,为系数矢量矢量矢量矢量和在高维环境中的链接。我们开发了一个基于可缩放的参数矢量矢量和链接,在设计矩阵 $\\ mathb{X} 、 错误术语 $\ =epslon 及其依赖性。连接函数与低度等等离子缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩率率率率率率率率率率率 。这个系数的合并率的合并率也是 数据研究方法, 和数据解算法。