Even though Weighted Lasso regression has appealing statistical guarantees, it is typically avoided due to its complex search space described with thousands of hyperparameters. On the other hand, the latest progress with high-dimensional HPO methods for black-box functions demonstrates that high-dimensional applications can indeed be efficiently optimized. Despite this initial success, the high-dimensional HPO approaches are typically applied to synthetic problems with a moderate number of dimensions which limits its impact in scientific and engineering applications. To address this limitation, we propose LassoBench, a new benchmark suite tailored for an important open research topic in the Lasso community that is Weighted Lasso regression. LassoBench consists of benchmarks on both well-controlled synthetic setups (number of samples, SNR, ambient and effective dimensionalities, and multiple fidelities) and real-world datasets, which enable the use of many flavors of HPO algorithms to be improved and extended to the high-dimensional setting. We evaluate 5 state-of-the-art HPO methods and 3 baselines, and demonstrate that Bayesian optimization, in particular, can improve over the methods commonly used for sparse regression while highlighting limitations of these frameworks in very high-dimensions. Remarkably, Bayesian optimization improve the Lasso baselines on 60, 100, 300, and 1000 dimensional problems by 45.7%, 19.2%, 19.7% and 15.5%, respectively.
翻译:尽管拉索重力回归具有吸引力的统计保障,但通常会由于它以数千个超参数描述的复杂搜索空间而避免。另一方面,高维HPO黑箱功能方法的最新进展表明,高维应用确实可以有效优化。尽管取得了这一初步成功,高维HPO方法通常适用于合成问题,其层面不多,限制了其在科学和工程应用中的影响。为解决这一限制,我们提议LassoBench,这是一套新的基准套件,专门为拉索社区重要的开放研究专题定制的,即Weighted Lasso回归。拉索Bench由高控合成组合(样本数量、SNR、环境和有效的维度和多重忠诚)和现实世界数据集(这些模型的利用程度不高,限制了HPO算的许多调,限制了其在科学和工程应用中的影响。我们建议,LassoBench,这是一套新的基准套套套件,专门为拉索社区重要的开放研究课题定制的套件,即Weighted Lasso Reture。Lasso。Lassench, 包括45个良好控制的合成组合(样本、环境、环境上常用的300% ) 和高标准框架限制,同时强调这些框架。