Automated high-stake decision-making such as medical diagnosis requires models with high interpretability and reliability. As one of the interpretable and reliable models with good prediction ability, we consider Sparse High-order Interaction Model (SHIM) in this study. However, finding statistically significant high-order interactions is challenging due to the intrinsic high dimensionality of the combinatorial effects. Another problem in data-driven modeling is the effect of "cherry-picking" a.k.a. selection bias. Our main contribution is to extend the recently developed parametric programming approach for selective inference to high-order interaction models. Exhaustive search over the cherry tree (all possible interactions) can be daunting and impractical even for a small-sized problem. We introduced an efficient pruning strategy and demonstrated the computational efficiency and statistical power of the proposed method using both synthetic and real data.
翻译:医学诊断等自动化高发决策要求具有高度可解释性和可靠性的模型。作为具有良好预测能力的可解释和可靠模型之一,我们认为在本研究中采用粗体高阶互动模型(SHIM ) 。然而,由于组合效应固有的高度多维性,发现具有统计意义的高阶互动具有挑战性。数据驱动模型的另一个问题是“筛选” a.k.a.选择偏差的影响。我们的主要贡献是扩大最近开发的用于选择性推断高序互动模型的参数规划方法。对樱桃树的彻底研究(所有可能的相互作用)可能是艰巨和不切实际的,即使对于一个小问题也是如此。我们采用了高效的裁剪裁战略,并用合成数据和真实数据展示了拟议方法的计算效率和统计能力。