VIP内容

本文提出了一种高效的多尺度视觉转换器,称为ResT,可作为图像识别的通用骨干。现有的Transformer方法使用标准Transformer块来处理具有固定分辨率的原始图像,与之不同的是,我们的ResT有几个优点:(1)构建高效记忆的多头自注意,通过简单的深度卷积压缩记忆,在保持多头多样性的同时,在注意-多头维度上投射相互作用;(2)将位置编码构造为空间注意,更加灵活,可以处理任意尺寸的输入图像,无需插值或微调;(3)我们没有在每个阶段开始时直接进行标记化,而是将patch嵌入设计为在标记映射上进行跨步重叠卷积操作的堆栈。我们在图像分类和下游任务上全面验证了ResT。实验结果表明,提出的ResT可以在很大程度上超过最新的骨干技术,这表明ResT作为强大骨干的潜力。代码和模型将在https://github.com/wofmanaf/ResT上公开。

成为VIP会员查看完整内容
0
3

最新论文

Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collaborative Environment. The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. Understanding the complexity of the challenge, we split it into sub-tasks to make it feasible for participants. This research challenge is naturally related, but not limited, to two fields of study that are highly relevant to the NeurIPS community: Natural Language Understanding and Generation (NLU/G) and Reinforcement Learning (RL). Therefore, the suggested challenge can bring two communities together to approach one of the important challenges in AI. Another important aspect of the challenge is the dedication to perform a human-in-the-loop evaluation as a final evaluation for the agents developed by contestants.

0
0
下载
预览
Top