We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.
翻译:我们建议采用AdaPPP 噪音增强(PANDA)程序,规范一般线性模型(GLM)的估计和推断。PANDA 迭代优化客观功能,因为噪音增加的数据会增加,直到获得常规模型估计数;增加的噪音是为了实现各种正规化效果,包括$0美元、桥梁(包括悬崖和山脊)、弹性网、适应性拉索和SCAD,以及群体拉索和接合脊。我们检查噪声增强损失功能的尾端,确定噪声增强损失功能及其最小化与预期受罚损失功能及其最小化的几乎肯定的趋同。我们从常规化参数中得出无损分布,在此基础上,可以与变量选择同时获得推断。PANDA 展示有助于进一步减少一般化错误的全方位学习行为。 兼容性,PANDA 很容易调控现有软件用于实施GLMS,而不用接近复杂的最佳优化技术,我们用目前高超标的PA型模拟程序,或者通过高端的图像显示目前高端数据。