在最佳巴伊斯转移学习背景下对错误估计的强力重要性抽样 (Robust Importance Sampling for Error Estimation in the Context of Optimal Bayesian Transfer Learning)

Classification has been a major task for building intelligent systems as it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions--either explicitly or implicitly. In many scientific or clinical settings, training data are typically limited, which makes designing accurate classifiers and evaluating their classification error extremely challenging. While transfer learning (TL) can alleviate this issue by incorporating data from relevant source domains to improve learning in a different target domain, it has received little attention for performance assessment, notably in error estimation. In this paper, we fill this gap by investigating knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a novel class of Bayesian minimum mean-square error (MMSE) estimators for optimal Bayesian transfer learning (OBTL), which enables rigorous evaluation of classification error under uncertainty in a small-sample setting. Using Monte Carlo importance sampling, we employ the proposed estimator to evaluate the classification accuracy of a broad family of classifiers that span diverse learning capabilities. Experimental results based on both synthetic data as well as real-world RNA sequencing (RNA-seq) data show that our proposed OBTL error estimation scheme clearly outperforms standard error estimators, especially in a small-sample setting, by tapping into the data from other relevant domains.

翻译：分类设计旨在从培训数据中建立模型,以明确或隐含地代表地标分布;在许多科学或临床环境中,培训数据通常有限,使得设计准确的分类员和评价其分类错误极为困难;虽然转移学习(TL)可以通过纳入相关源域的数据来缓解这一问题,从而改进不同目标领域的学习,但对于业绩评估却很少引起注意,特别是在误差估计方面。在本文中,我们通过调查在贝叶斯模式中分类错误估计的知识可转让性来填补这一差距。我们引入了一种新型的贝叶斯最低平均差错(MMSE)估计器类,用于最佳巴伊西亚转移学习(OBTL),它使得能够在小型抽样环境中对不确定情况下的分类错误进行严格的评估。我们使用蒙特卡洛重要抽样,我们使用拟议的估计仪来评估跨不同学习能力的广泛分类组群的分类准确性。我们根据合成数据以及真实世界的RNA最低偏差(RNA)最低偏差(Obregal)进行实验结果,在实际的域中,特别精确地将我们的拟议数据定出其他的误差(RNA-estregal-destring)中,具体地将数据显示其他误差系统。