We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules. Such general-purpose methods offer advantages of simplicity in design, positive scaling with available compute, and versatile applicability to multiple tasks. Our work builds upon the recent success of self-supervised learning (SSL) for pre-training vision transformers (ViT). However, while the training recipes for convolutional networks are mature and robust, the recipes for ViTs are contingent and brittle, and in the case of ViTs for visual navigation, yet to be fully discovered. Specifically, we find that vanilla ViTs do not outperform ResNets on visual navigation. We propose the use of a compression layer operating over ViT patch representations to preserve spatial information along with policy training improvements. These improvements allow us to demonstrate positive scaling laws for the first time in visual navigation tasks. Consequently, our model advances state-of-the-art performance on ImageNav from 54.2% to 82.0% success and performs competitively against concurrent state-of-art on ObjectNav with success rate of 64.0% vs. 65.0%. Overall, this work does not present a fundamentally new approach, but rather recommendations for training a general-purpose architecture that achieves state-of-art performance today and could serve as a strong baseline for future methods.
翻译:我们提出了一个单一的神经网络结构,由任务-不可知性组成部分(ViTs、 Convolutions和LSTMs)组成,在图像Nav(“前往<此图片>的位置”)和OjornNav(“找到椅子”)任务上实现最新结果,没有诸如物体探测、分解、绘图或规划模块等特定任务模块。这些通用方法在设计上提供了简单性的好处,在可用计算和多功能性地适用于多项任务。我们的工作以培训前愿景变异器(ViT)的自我监督学习(SSL)最近的成功为基础。然而,尽管对图像变异网络的培训食谱是成熟和稳健的,但ViT的食谱是随当前政策培训改进而保持的最强的, ViTs的食谱是随机的, ViTs 和 ViTs 用于视觉导航, 具体来说,我们发现香草 ViTs并不超越视觉导航的 ResNet。我们提议使用一个压缩的州级图层结构, 来维护空间信息的强性平整形结构, 与当前政策培训的改进。这些直径标准, 这些改进让我们能够以显示直观的成绩,直观的成绩,直观的运行, 和直观的成绩法, 。</s>