In clinical trials and other applications, we often see regions of the feature space that appear to exhibit interesting behaviour, but it is unclear whether these observed phenomena are reflected at the population level. Focusing on a regression setting, we consider the subgroup selection challenge of identifying a region of the feature space on which the regression function exceeds a pre-determined threshold. We formulate the problem as one of constrained optimisation, where we seek a low-complexity, data-dependent selection set on which, with a guaranteed probability, the regression function is uniformly at least as large as the threshold; subject to this constraint, we would like the region to contain as much mass under the marginal feature distribution as possible. This leads to a natural notion of regret, and our main contribution is to determine the minimax optimal rate for this regret in both the sample size and the Type I error probability. The rate involves a delicate interplay between parameters that control the smoothness of the regression function, as well as exponents that quantify the extent to which the optimal selection set at the population level can be approximated by families of well-behaved subsets. Finally, we expand the scope of our previous results by illustrating how they may be generalised to a treatment and control setting, where interest lies in the heterogeneous treatment effect.
翻译:在临床试验和其他应用中,我们常常看到特征空间的区域,这些似乎表现出有趣的行为,但尚不清楚这些观察到的现象是否在人口层面得到反映。以回归环境为重点,我们考虑分组选择挑战,即确定回归函数超过预定阈值的特征空间区域;我们将问题描述为限制优化,即我们寻求低复杂性、数据依赖选择集,保证概率的回归功能统一程度至少与临界值相同;但受这一限制,我们希望该区域在边缘特征分布中包含尽可能多的质量。这导致自然的遗憾概念,我们的主要贡献是确定在样本大小和类型I误差概率中这种遗憾的最小最大最佳率。这一比率涉及控制回归功能顺利性的参数之间的微妙相互作用,以及说明人口层面的最佳选择功能在多大程度上可以由成熟子集的家庭加以比较;最后,我们扩大了先前的处理结果的范围,通过说明它们是如何实现的变异性处理。