The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity. Building upon that, we propose novel online ensemble algorithms to deal with the non-stationarity of the environments. Our approach enjoys optimal \emph{dynamic regret}, indicating that the performance is competitive with a clairvoyant who knows the online environments in hindsight and then chooses the best decision for each round. The obtained dynamic regret bound scales with the intensity and pattern of label distribution shift, hence exhibiting the adaptivity in the OLaS problem. Extensive experiments are conducted to validate the effectiveness and support our theoretical findings.
翻译:当培训数据与即将到来的测试样本的分布相同时,标准监督的学习模式有效发挥作用。然而,这种固定假设往往在现实应用中被违反,特别是当测试数据以在线方式出现时。在本文中,我们制定并调查了\emph{online 标签转换}(OLAS):学习者从标签的离线数据中培训了一个初步模型,然后将它部署到一个没有标签的在线环境,在这个环境中,基本标签的分布会随着时间的变化而变化,但标签-条件密度不会变化。非固定性质和缺乏监督使得问题难以解决。为了解决困难,我们建立了一个新的公正风险估计器,利用未标的数据,显示许多无害的特性,尽管可能不协调。在此基础上,我们提出了新的在线全方位算法,以应对环境的不固定性。我们的方法享有最佳的\emph{活性冷感},这表明业绩与了解后方的在线环境环境具有竞争力,因此难以解决问题。为了应对困难,我们建立了一个新的公正风险估计器,我们用新的公正风险估计器,利用未标本的数据来显示许多无害的特性特性,尽管存在潜在的非共性特性。我们每个回合里程的模拟分析。