Deep neural networks perform well on prediction and classification tasks in the canonical setting where data streams are i.i.d., labeled data is abundant, and class labels are balanced. Challenges emerge with distribution shifts, including non-stationary or imbalanced data streams. One powerful approach that has addressed this challenge involves self-supervised pretraining of large encoders on volumes of unlabeled data, followed by task-specific tuning. Given a new task, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable (key, value) codes. In this setup, we follow the encode; process the representation via a discrete bottleneck; and decode paradigm, where the input is fed to the pretrained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a limited number of these (key, value) pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the proposed model to minimize the effect of the distribution shifts and show that such a discrete bottleneck with (key, value) pairs reduces the complexity of the hypothesis class. We empirically verified the proposed methods' benefits under challenging distribution shift scenarios across various benchmark datasets and show that the proposed model reduces the common vulnerability to non-i.i.d. and non-stationary training distributions compared to various other baselines.
翻译:深度神经网络在数据流所在的康纳环境下的预测和分类任务上表现良好。 标签数据丰富, 等级标签平衡。 分布变化带来挑战, 包括非静止或不平衡的数据流。 应对这一挑战的一个强大方法是, 由自我监督地对大量未贴标签的数据的大型编码员进行预培训, 并随后对具体任务进行调试。 有了一项新的任务, 更新这些编码器的重量具有挑战性, 因为大量重力需要微调, 结果它们忘记了先前任务的信息。 在目前的工作中, 我们提出了一个解决这一问题的模型结构, 包括不固定的或不平衡的数据流。 在此设置中, 我们遵循编码; 通过一个不贴标签的调试控器处理代表; 以及 解码模式, 输入未经过预先训练的编码, 不同的编码的输出只能用来选择最近的键, 并且相应的值将一个不固定的数值转换为不固定的序列分布, 显示当前任务中不固定的( 关键值) 的基值的值值值值 显示比重更新。