Best subset of groups selection (BSGS) is the process of selecting a small part of non-overlapping groups to achieve the best interpretability on the response variable. It has attracted increasing attention and has far-reaching applications in practice. However, due to the computational intractability of BSGS in high-dimensional settings, developing efficient algorithms for solving BSGS remains a research hotspot. In this paper,we propose a group-splicing algorithm that iteratively detects the relevant groups and excludes the irrelevant ones. Moreover, coupled with a novel group information criterion, we develop an adaptive algorithm to determine the optimal model size. Under mild conditions, it is certifiable that our algorithm can identify the optimal subset of groups in polynomial time with high probability. Finally, we demonstrate the efficiency and accuracy of our methods by comparing them with several state-of-the-art algorithms on both synthetic and real-world datasets.
翻译:最佳组别选择(BSGS)是选择一小部分非重叠组以获得对响应变量的最佳解释性的过程。它吸引了越来越多的关注,并在实践中具有深远的应用。然而,由于BSGS在高维环境中的计算不易,开发解决 BSGS的有效算法仍然是一个研究热点。在本文中,我们建议了一个群点混合算法,可迭接检测相关组别并排除不相关组别。此外,再加上一个新的群点信息标准,我们开发了一种适应性算法,以确定最佳模型大小。在温和条件下,可以证明我们的算法能够高概率地识别多米时间组群的最佳组别。最后,我们通过将这些算法与合成和现实世界数据集中的一些最先进的算法进行比较,展示了我们方法的效率和准确性。