Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes the target data based statistical estimation very difficult. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. By transfer learning sufficiency (denoted as the null hypothesis), we mean that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.
翻译:现代统计分析通常遇到高维模型但样本量有限的情况。这使得基于目标数据的统计估计非常困难。如何从另一个大型源数据中借用信息以获得更准确的目标模型估计是一个有趣的问题。这导致了转移学习的有用思想。近年来在这方面已经开发出了各种估计方法。在这项工作中,我们从不同的角度研究转移学习问题。具体而言,我们考虑测试转移学习充分性的问题。通过转移学习充分性(表示为零假设),我们意味着,通过利用源数据,可以充分提取目标数据特征向量中包含的有用信息以用于预测目标响应。因此,拒绝零假设意味着目标数据特征向量中仍存在有用的预测信息,因此需要进一步探索。为此,我们开发了一种新的测试程序和一个集中化的标准化检验统计量,其渐近零分布已经进行了分析推导。通过模拟研究展示了所提出的方法的有限样本性能。提供了一个与深度学习相关的实际数据示例用于说明。