The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of online label shift (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity. Building upon that, we propose novel online ensemble algorithms to deal with the non-stationarity of the environments. Our approach enjoys optimal dynamic regret, indicating that the performance is competitive with a clairvoyant who knows the online environments in hindsight and then chooses the best decision for each round. The obtained dynamic regret bound scales with the intensity and pattern of label distribution shift, hence exhibiting the adaptivity in the OLaS problem. Extensive experiments are conducted to validate the effectiveness and support our theoretical findings.
翻译:当培训数据与即将到来的测试样本的分布相同时,标准监督的学习模式有效发挥作用。然而,这一假设往往在现实世界的应用中被违反,特别是当测试数据出现在线时。在本文中,我们制定并调查在线标签转换(OLAS)问题:学习者从标签的离线数据中培训初始模型,然后将其应用到一个没有标签的在线环境中,即基本标签分布随时间变化而变化,但标签-条件密度却不变化。非静态性和缺乏监督使得问题难以解决。为了解决困难,我们建立了一个新的公正风险估计器,利用未贴标签的数据,显示许多无害的属性,尽管潜在的不兼容性。在此基础上,我们提出了新的在线共同算法,以应对环境的不静止性。我们的方法享有最佳的动态遗憾,表明与了解后视线环境的Cliirvoyant相比,业绩具有竞争力,然后选择了每个回合的最佳决定。我们获得的动态后退缩缩缩缩缩缩缩缩缩缩略图,以调整我们的标签分发结果的强度和模式。