A compositional tree refers to a tree structure on a set of random variables where each random variable is a node and composition occurs at each non-leaf node of the tree. As a generalization of compositional data, compositional trees handle more complex relationships among random variables and appear in many disciplines, such as brain imaging, genomics and finance. We consider the problem of sparse regression on data that are associated with a compositional tree and propose a transformation-free tree-based regularized regression method for component selection. The regularization penalty is designed based on the tree structure and encourages a sparse tree representation. We prove that our proposed estimator for regression coefficients is both consistent and model selection consistent. In the simulation study, our method shows higher accuracy than competing methods under different scenarios. By analyzing a brain imaging data set from studies of Alzheimer's disease, our method identifies meaningful associations between memory declination and volume of brain regions that are consistent with current understanding.
翻译:构造树是指一组随机变量上的树结构, 其中每个随机变量是一个节点, 并且每棵树的非叶节点都有组成。 作为组成数据的一般化, 构造树处理随机变量之间更为复杂的关系, 并出现在许多学科中, 如脑成像、 基因组学和金融学。 我们考虑了与组成树相关的数据中少见回归的问题, 并提出了一个基于树的无转变的正常回归方法 。 正规化处罚是根据树结构设计的, 并且鼓励稀薄的树木代表。 我们证明我们提议的回归系数估计值是一致的, 并且模式选择一致的。 在模拟研究中, 我们的方法比不同情景下的竞争方法显示的准确性更高。 通过分析从对阿尔茨海默症的研究中得出的大脑成像数据集, 我们的方法确定了记忆分解与脑区域数量之间与当前理解一致的有意义的关联 。