Relating a set of variables X to a response y is crucial in chemometrics. A quantitative prediction objective can be enriched by qualitative data interpretation, for instance by locating the most influential features. When high-dimensional problems arise, dimension reduction techniques can be used. Most notable are projections (e.g. Partial Least Squares or PLS ) or variable selections (e.g. lasso). Sparse partial least squares combine both strategies, by blending variable selection into PLS. The variant presented in this paper, Dual-sPLS, generalizes the classical PLS1 algorithm. It provides balance between accurate prediction and efficient interpretation. It is based on penalizations inspired by classical regression methods (lasso, group lasso, least squares, ridge) and uses the dual norm notion. The resulting sparsity is enforced by an intuitive shrinking ratio parameter. Dual-sPLS favorably compares to similar regression methods, on simulated and real chemical data. Code is provided as an open-source package in R: \url{https://CRAN.R-project.org/package=dual.spls}.
翻译:将一组变量X与反应和反应联系起来,在色度测量中至关重要。量化预测目标可以通过定性数据解释,例如定位最有影响力的特征,通过定性数据解释来丰富。当出现高维问题时,可以使用维度减少技术。最值得注意的是预测(例如部分最小方或PLS)或可变选择(例如 lasso)或变量选择(例如, lasso)。通过将变量选择与PLS混合,分散的最小方块将两种战略结合起来。本文提出的变式“双子”概括了古典PLS1算法。它提供了准确的预测与高效解释之间的平衡。它基于受古典回归方法(lasso, group las lasso, lasso, least plas, ridge) 启发的处罚,并使用双重规范概念。由此产生的偏差由直缩率比率参数强制实施。在模拟和真实化学数据上,双PLS优于类似的回归方法。代码作为开放源包提供,见R:http://CRAN.Rproject.org/packageages。