In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
翻译:在这项工作中,我们提出了一个超变异模式,即基于变压器的微小学习模式,直接从支持样本中产生一个神经神经网络的重量。由于小型生成的CNN模型对具体任务的依赖性是由高容量变压器模型编码的,我们有效地将大型任务空间的复杂性与单个任务的复杂性脱钩。我们的方法对于小型目标CNN结构特别有效,因为在这种结构中,学习固定的通用任务独立嵌入不是最佳的,当有关任务的信息能够调节所有模型参数时,就实现了更好的性能。对于更大的模型来说,我们发现仅生成最后一个层就能使我们产生比以最先进方法获得的更具有竞争力或更好的结果,同时又可以最终到最终的不同。最后,我们将我们的方法推广到一个半监督的系统,利用成套支持中的未贴标签的样本,并进一步改进少数光谱的性能。