We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.
翻译:我们考虑对分类和递减树(CART)[Breiman等人,1984年]算法的输出进行推断。一种不考虑数据对树进行估计这一事实的幼稚的推论方法不会达到标准保证,例如第1类误率控制和名义覆盖。因此,我们提出一个有选择的推论框架,用于对装配的CART树进行推论。在简略图中,我们以该树是从数据中估算为条件。我们建议对控制第1类选择性误差率的一对终端节点之间的平均反应差异进行测试,并在一个达到名义选择性覆盖的单一终端节点内对平均反应的置信度间隔进行测试。提供了计算必要调制装置的有效算法。我们在模拟中采用这些方法,并在涉及部分控制干预和卡路里摄入之间关联的数据集中采用这些方法。