Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. We also demonstrate an application of the proposed method with a thyroid disease data set.
翻译:基于共变的多变量反应矢量各元素之间的有条件共变或相互关系对包括神经科学、流行病学和生物医学在内的各个领域都很重要,我们提议了一种叫做随机森林共变递减(CovRegRF)的新方法,以利用随机森林框架估计一组共变数的多变反应的共变矩阵。随机森林树木是用一条分化规则建造的,专门设计目的是最大限度地扩大子节点样本共变矩阵估计之间的差别。我们还提议对一组共变数的部分效果进行重大测试。我们通过模拟研究评估拟议方法的性能和重要性测试,该模拟研究表明,拟议方法提供了准确的共变数矩阵估计,并很好地控制了1型误差。我们还展示了采用甲状腺疾病数据集的拟议方法的情况。