Model degrees of freedom ($\df$) is a fundamental concept in statistics because it quantifies the flexibility of a fitting procedure and is indispensable in model selection. The $\df$ is often intuitively equated with the number of independent variables in the fitting procedure. But for adaptive regressions that perform variable selection (e.g., the best subset regressions), the model $\df$ is larger than the number of selected variables. The excess part has been defined as the \emph{search degrees of freedom} ($\sdf$) to account for model selection. However, this definition is limited since it does not consider fitting procedures in augmented space, such as splines and regression trees; and it does not use the same fitting procedure for $\sdf$ and $\df$. For example, the lasso's $\sdf$ is defined through the \emph{relaxed} lasso's $\df$ instead of the lasso's $\df$. Here we propose a \emph{modified search degrees of freedom} ($\msdf$) to directly account for the cost of searching in the original or augmented space. Since many fitting procedures can be characterized by a linear operator, we define the search cost as the effort to determine such a linear operator. When we construct a linear operator for the lasso via the iterative ridge regression, $\msdf$ offers a new perspective for its search cost. For some complex procedures such as the multivariate adaptive regression splines (MARS), the search cost needs to be pre-determined to serve as a tuning parameter for the procedure itself, but it might be inaccurate. To investigate the inaccurate pre-determined search cost, we develop two concepts, \emph{nominal} $\df$ and \emph{actual} $\df$, and formulate a property named \emph{self-consistency} when there is no gap between the \emph{nominal} $\df$ and the \emph{actual} $\df$.
翻译:暂无翻译