In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. The methods that can incorporate such restrictions can improve model interpretability and prediction accuracy. Penalized regression can integrate selection rules by assigning the coefficients to different groups and then applying penalties to the groups. However, no general framework has been proposed to formalize selection rules and their applications. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We show that all selection rules can be represented as a combination of operations on constructs, which can be used to identify the related selection dictionary. One may then apply some criteria to select the best model. We show that the theoretical framework can help to identify the grouping structure in existing penalized regression methods. In addition, we formulate structured variable selection into mixed-integer optimization problems which can be solved by existing software. Finally, we discuss the significance of the framework in the context of statistics.
翻译:在可变选择中,由于候选变量中固有的结构性限制,有必要制定一套可允许的选定变量集(称为“选择字典”)的甄选规则。可以纳入这些限制的方法可以改进模型的解释性和预测准确性。惩罚性回归可以通过将系数分配给不同群体,然后对群体实施惩罚,将选择规则纳入选择规则。然而,没有提出将选择规则及其应用正式化的一般框架。在这项工作中,我们为结构化变量选择建立一个框架,其中可以包括普遍的结构性限制。我们开发了一种数学语言,用于构建任意选择规则,其中选择词典是正式定义的。我们展示了所有选择规则可以作为建筑操作的组合,用于确定相关的选择词典。然后,我们可以应用一些标准来选择最佳模型。我们表明,理论框架可以帮助确定现有惩罚回归方法中的组合结构结构。此外,我们还将结构化变量选择纳入现有的软件可以解决的混合英特最优化问题。最后,我们讨论了框架在统计方面的意义。