Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (IC) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of IC can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of IC, which we call PanIC (from the Greek root 'pan', meaning 'of everything'), with easily verifiable regularity conditions. The PanIC are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression, and principal component analysis, and we demonstrate the effectiveness of the PanIC for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC to PanIC.
翻译:模型选择是在许多统计和机器学习方法的应用中产生的一个普遍存在的问题。在可能性和相关情况下,典型的做法是使用信息标准方法(IC)来选择竞争模式中最相似的模型,惩罚基于可能性的客观功能。保证IC一致性的理论往往难以核实,而且往往是具体和直言不讳的。我们提出了一套结果,保证IC类的一致性,我们称之为PanIC(来自希腊根根根“pan”,意指“一切”),具有易于核查的常规性条件。PanIC适用于任何基于损失的学习问题,不排斥于可能性问题。我们演示了对关于有限混合模型的模型选择问题的常规性条件的核查,最小偏差和支持矢量回归,以及主要组成部分分析,我们通过数字模拟展示了PanIC对这类问题的有效性。此外,我们为BIC类估算者的一致性提出了新的充分条件,并为BIC类估算者与PanIC提供了比较。</s>