The problem of covariate-shift generalization has attracted intensive research attention. Previous stable learning algorithms employ sample reweighting schemes to decorrelate the covariates when there is no explicit domain information about training data. However, with finite samples, it is difficult to achieve the desirable weights that ensure perfect independence to get rid of the unstable variables. Besides, decorrelating within stable variables may bring about high variance of learned models because of the over-reduced effective sample size. A tremendous sample size is required for these algorithms to work. In this paper, with theoretical justification, we propose SVI (Sparse Variable Independence) for the covariate-shift generalization problem. We introduce sparsity constraint to compensate for the imperfectness of sample reweighting under the finite-sample setting in previous methods. Furthermore, we organically combine independence-based sample reweighting and sparsity-based variable selection in an iterative way to avoid decorrelating within stable variables, increasing the effective sample size to alleviate variance inflation. Experiments on both synthetic and real-world datasets demonstrate the improvement of covariate-shift generalization performance brought by SVI.
翻译:常态变换通用问题引起了大量研究的注意。 以往的稳定学习算法在没有关于培训数据的明确域信息时采用样本再加权办法,在没有关于培训数据的明确域信息的情况下对共变法进行改装。 然而,在有限的样本中,很难实现确保完全独立的适当加权,以摆脱不稳定变量。此外,由于过份减少的有效样本规模,在稳定变量中进行变装,可能导致所学模型的差别很大。这些算法要起作用,需要巨大的样本规模。在本文中,根据理论理由,我们提议对共变换通用问题采用SVI(参数独立变量独立),我们采用宽度限制,以补偿在以往方法中根据有限抽样设定的样本再加权的不完善性。此外,我们有机地将基于独立的样本再加权和基于宽度的变量选择结合起来,以避免在稳定变量中发生变异,提高有效样本规模以缓解差异性通货膨胀。在合成和现实世界数据集成上进行的实验表明,SVI化一般性表现的改进。