We propose a novel active learning strategy for regression, which is model-agnostic, robust against model mismatch, and interpretable. Assuming that a small number of initial samples are available, we derive the optimal training density that minimizes the generalization error of local polynomial smoothing (LPS) with its kernel bandwidth tuned locally: We adopt the mean integrated squared error (MISE) as a generalization criterion, and use the asymptotic behavior of the MISE as well as thelocally optimal bandwidths (LOB) -- the bandwidth function that minimizes MISE in the asymptotic limit. The asymptotic expression of our objective then reveals the dependence of the MISE on the training density, enabling analytic minimization. As a result, we obtain the optimal training density in a closed-form. The almost model-free nature of our approach should encode raw properties of the target problem, and thus provide a robust and model-agnostic active learning strategy. Furthermore, the obtained training density factorizes the influence of local function complexity, noise leveland test density in a transparent and interpretable way. We validate our theory in numerical simulations, and show that the proposed active learning method outperforms the existing state-of-the-art model-agnostic approaches.
翻译:我们提出一种新的积极的回归学习战略,即模型-不可知性,与模型不匹配和可解释。假设有少量初始样本,我们得出最佳培训密度,最大限度地减少局部多球间平滑(LPS)的一般误差,其内核带宽调控本地:我们采用中值集成方差(MISE)作为一般化标准,并使用MISE和局部最佳带宽(LOB)的无症状行为 -- -- 带宽功能将无症状限制中的MISE最小化。我们的目标的淡化表达表明MISE对培训密度的依赖性,使分析性最小化。结果,我们在封闭式模式中获得最佳培训密度。我们的方法几乎没有模型,它应该将目标问题的原始特性编码起来,从而提供一个稳健和模型-最优化的积极学习战略。此外,我们获得的培训密度因素将本地功能复杂性、噪音水平和测试密度的影响降至无症状的极限值,从而能够以透明和可解释的方式进行模拟。我们用封闭式的模型来验证我们现有的学习方法。我们用目前的理论来验证我们现有的数字模拟。