This paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage any auxiliary information on row and column structures. The GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, the GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for modeling relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse; it also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI on simulation studies and an application to human microbiome data.
翻译:本文用双向结构化数据研究高维回归。为了估计高维系数矢量,我们提议通用矩阵分解回归(GMDR),以便有效地利用行和列结构中的任何辅助信息。GMDR将主要组成部分回归(PCR)扩展至双向结构化数据,但与PCR不同,GMDR选择的结果最能预测的成分,从而导致更准确的预测。关于个别变量的回归系数,我们提议通用矩阵分解推论(GMDI),这是包括拟议的GMDDR估计器在内的大型估算器的通用高维推论框架。GMDR和GMDI为相关辅助行和列结构的建模提供了更大的灵活性。因此,GMDI并不要求真正的回归系数稀疏;它也允许依赖性和高度的观测。我们从类型一误率和能量的角度研究GMDI的理论属性,并展示GMDR和GDI在模拟研究及人类微生物数据应用方面的有效性。