The nonnegative garrote (NNG) is among the first approaches that combine variable selection and shrinkage of regression estimates. When more than the derivation of a predictor is of interest, NNG has some conceptual advantages over the popular lasso. Nevertheless, NNG has received little attention. The original NNG relies on least-squares (OLS) estimates, which are highly variable in data with a high degree of multicollinearity (HDM) and do not exist in high-dimensional data (HDD). This might be the reason that NNG is not used in such data. Alternative initial estimates have been proposed but hardly used in practice. Analyzing three structurally different data sets, we demonstrated that NNG can also be applied in HDM and HDD and compared its performance with the lasso, adaptive lasso, relaxed lasso, and best subset selection in terms of variables selected, regression estimates, and prediction. Replacing OLS by ridge initial estimates in HDM and lasso initial estimates in HDD helped NNG select simpler models than competing approaches without much increase in prediction errors. Simpler models are easier to interpret, an important issue for descriptive modelling. Based on the limited experience from three datasets, we assume that the NNG can be a suitable alternative to the lasso and its extensions. Neutral comparison simulation studies are needed to better understand the properties of variable selection methods, compare them and derive guidance for practice.
翻译:非阴性甘蓝(NNG)是最早结合变数选择和缩缩回归估计的方法之一。 当比预测器的衍生物更引人注意时,NNG对流行的拉索具有一些概念上的优势。 然而,NNNG没有受到多少注意。 原NNG依靠最小方(OLS)的估计数,这些数据在数据中差异很大,数据具有高度的多曲线性(HDM),在高维数据中不存在。这可能是为什么NNNG没有在这类数据中使用。提出了替代的初步估计数,但在实践中却很少使用。分析三个结构上不同的数据集时,我们证明NNNG也可以在HDM和HDD中应用,比较其性能与LASso、适应性拉索、放松的拉索,以及在所选变量、回归估计和预测方面的最佳子项选择。HDDM和LAS的初始估计有助于NNG选择更简单的模型,而不是在预测错误中进行更多的比较。 简单化的模型可以比较,而较容易地解释它所需要的数据。