The information criterion for determining the number of explanatory variables in a subset regression modeling is discussed. Information criterion such as AIC is effective and frequently used in model selection for ordinary regression models and statistical models. With the recent prosperity of data science, analysis of large-scale data has become important. When constructing models heuristically from a very large number of candidate explanatory variables, there is a possibility of picking up apparent correlations and adopting inappropriate variables. In this paper, we point out the problems specific to subset regression from the viewpoint of bias correction for log-likelihood and present a correction method that takes this into account.
翻译:暂无翻译