We study case influence in the Lasso regression using Cook's distance which measures overall change in the fitted values when one observation is deleted. Unlike in ordinary least squares regression, the estimated coefficients in the Lasso do not have a closed form due to the nondifferentiability of the $\ell_1$ penalty, and neither does Cook's distance. To find the case-deleted Lasso solution without refitting the model, we approach it from the full data solution by introducing a weight parameter ranging from 1 to 0 and generating a solution path indexed by this parameter. We show that the solution path is piecewise linear with respect to a simple function of the weight parameter under a fixed penalty. The resulting case influence is a function of the penalty and weight, and it becomes Cook's distance when the weight is 0. As the penalty parameter changes, selected variables change, and the magnitude of Cook's distance for the same data point may vary with the subset of variables selected. In addition, we introduce a case influence graph to visualize how the contribution of each data point changes with the penalty parameter. From the graph, we can identify influential points at different penalty levels and make modeling decisions accordingly. Moreover, we find that case influence graphs exhibit different patterns between underfitting and overfitting phases, which can provide additional information for model selection.
翻译:暂无翻译