带有自动变量选择的强力分布回归 (Robust Distributional Regression with Automatic Variable Selection)

Datasets with extreme observations and/or heavy-tailed error distributions are commonly encountered and should be analyzed with careful consideration of these features from a statistical perspective. Small deviations from an assumed model, such as the presence of outliers, can cause classical regression procedures to break down, potentially leading to unreliable inferences. Other distributional deviations, such as heteroscedasticity, can be handled by going beyond the mean and modelling the scale parameter in terms of covariates. We propose a method that accounts for heavy tails and heteroscedasticity through the use of a generalized normal distribution (GND). The GND contains a kurtosis-characterizing shape parameter that moves the model smoothly between the normal distribution and the heavier-tailed Laplace distribution - thus covering both classical and robust regression. A key component of statistical inference is determining the set of covariates that influence the response variable. While correctly accounting for kurtosis and heteroscedasticity is crucial to this endeavour, a procedure for variable selection is still required. For this purpose, we use a novel penalized estimation procedure that avoids the typical computationally demanding grid search for tuning parameters. This is particularly valuable in the distributional regression setting where the location and scale parameters depend on covariates, since the standard approach would have multiple tuning parameters (one for each distributional parameter). We achieve this by using a "smooth information criterion" that can be optimized directly, where the tuning parameters are fixed at $\log(n)$ in the BIC case.

翻译：带有极端观测和(或)重尾误差分布的数据集通常会遇到,并且应该从统计角度仔细考虑这些特征来分析。与假设模型的细小偏差,例如外部线的存在,可能导致经典回归程序崩溃,可能导致不可靠的推论。其他分布偏差,例如异相性,可以通过超出平均值来处理,用共变法来模拟比例参数。我们建议一种方法,通过使用通用的正常分布(GND)来计算重尾和超正差。GND包含一个库特松is-字符化形状参数,该模型在正常分布和较细的拉位分布之间移动顺利,从而有可能导致不可靠的推论。其他分布偏差,例如异性等,可以通过超越平均值来处理,用共变差参数来模拟比例参数。虽然正确计算曲折和超差性,但仍需要一种变量搜索程序。为此,我们使用一种新式的精确度估算参数,使模型在正常分布分布和更精确的拉比值分布标准之间移动,从而避免了典型的比值,因此,在计算每个典型的比值中,在计算中,可以使典型的比值精确的比值的比值的比值的比值中,可以使标准的比值的比值能够使每个比值的比值的比值能够使标准的比值调整。