FAStEN：高维函数回归中的自适应特征选择和估计的高效方法 (FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions)

Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible, and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components, and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance without sacrificing the quality of the coefficients' estimation. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study.

翻译：功能回归分析是当代许多科学应用程序的已确认工具。涉及大而复杂的数据集的回归问题是普遍存在的，特征选择对于避免过度拟合和实现准确预测至关重要。我们提出了一种新的、灵活的、超高效的方法，用于在稀疏高维函数对函数回归问题中执行特征选择，展示如何将其扩展到标量对函数框架中。我们的方法将功能数据、优化和机器学习技术相结合，同时执行特征选择和参数估计。利用功能主成分的性质和双增广Lagrange问题的稀疏性，我们显著减少了计算成本，并引入了一种自适应方案来提高选择准确性。通过广泛的模拟研究，我们对最佳现有竞争者进行了基准测试，并展示了在不牺牲系数估计质量的情况下，在CPU时间和选择性能方面的巨大收益。最后，我们展示了对AOMIC PIOP1研究中的脑fMRI数据的应用。

相关内容

特征选择

关注 0

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日