We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieves performance comparable to full fine-tuning, at a cost comparable to fine-tuning just the last layer. For example, with a cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve performance within 0.2% of the full fine-tuning paragon at 51% training cost of the baseline, on average across 11 downstream classification tasks. Unlike other forms of efficient adaptation, InCA does not require backpropagating through the pre-trained model, thus leaving its execution unaltered at both training and inference. The versatility of InCA is best illustrated in fine-grained tasks, which may require accessing information absent in the last layer but accessible in intermediate layer activations. Since the backbone is fixed, InCA allows parallel ensembling as well as parallel execution of multiple tasks. InCA achieves state-of-the-art performance in the ImageNet-to-Sketch multi-task benchmark.
翻译:我们提议了InCA, 即: 一种将学习与培训前模式的任何启动层相交叉的轻量化学习转换到任何启动层的学习方法。 培训期间, InCA 使用一个单一的远端传票来提取多个启动, 平均在11个下游分类任务中, 由外部交叉关注适应适应者进行新培训, 合并或选择进行下游任务。 我们显示, 即使选择一个单一的顶层调整适应器, InCA 也取得与完全微调相近的成绩, 其成本与微调相近, 仅与最后一层相近。 例如, 通过一个交叉关注检测器, 预先培训的VIT- L/16 模型的大小为1.3%, 我们达到完全微调伞的0.2%的性能, 平均在51% 的基线培训费用下游分类任务中, 。 不同于其他形式的有效适应, InCA不要求通过预先测试模型进行反向再调整, 其执行在培训和推断上没有偏差。 精细度任务中, 最能性地说明InCA,, 可能需要在最后一层但可以在中间层运行中可获取的信息。</s>