用于加速自动超强参数优化的封闭式交叉验证以加快具有极小样本大小的高多数据内嵌地物选择的自动超强参数 (Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes)

2022 年 9 月 12 日

Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

翻译：用于加速自动超强参数优化的封闭式交叉验证以加快具有极小样本大小的高多数据内嵌地物选择的自动超强参数

Sigrun May,Sven Hartmann,Frank Klawonn

Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or prior knowledge. One pruning strategy immediately stops computing trials with semantically meaningless results for the selected hyperparameter combinations. The other is a new extrapolating threshold pruning strategy suitable for nested-cross-validation with a high variance of performance evaluation metrics. In repeated experiments, our combined pruning strategy keeps all promising trials. At the same time, the calculation time is substantially reduced compared to using a state-of-the-art asynchronous successive halving pruner alone. Up to 81.3\% fewer models were trained achieving the same optimization result. Conclusion: The proposed combined pruning strategy accelerates data analysis or enables deeper searches for hyperparameters within the same computation time. This leads to significant savings in time, money and energy consumption, opening the door to advanced, time-consuming analyses.

翻译：背景: 在抽样规模非常小的高维数据中, 嵌入式特征选择要求为模型构建进程优化超参数。对于此超参数优化, 必须应用嵌入式交叉校准来避免偏差性估测。由此产生的高维数据的反复培训导致计算时间过长。此外, 可能观察到个人业绩评价指标差异很大, 由微小校准机组的外端值导致。因此, 及早停止应用标准修剪算法来节省时间风险, 丢弃有希望的超参数组。结果 : 加快用于具有小样样尺寸的高维度数据的特性选择。对于这种高维度数据, 我们必须应用嵌入式交叉校准校准校准校准, 此外, 我们根据域或先前的知识, 将它与两个互补的校准策略结合起来。一个修整策略将停止计算出精度无意义的测试, 另一套是新的超值计算法, 适合嵌入超高比重校准的超标值计算策略, 以及高比重的超高校准时间值, 联合校准的节算算法。重复实验中, 快速计算, 将持续的节算, 将持续计算, 持续计算, 持续计算, 持续的策略将持续进行将持续进行模拟计算。