In order for artificial agents to successfully perform tasks in changing environments, they must be able to both detect and adapt to novelty. However, visual novelty detection research often only evaluates on repurposed datasets such as CIFAR-10 originally intended for object classification, where images focus on one distinct, well-centered object. New benchmarks are needed to represent the challenges of navigating the complex scenes of an open world. Our new NovelCraft dataset contains multimodal episodic data of the images and symbolic world-states seen by an agent completing a pogo stick assembly task within a modified Minecraft environment. In some episodes, we insert novel objects within the complex 3D scene that may impact gameplay and appear in a variety of sizes and positions. Our visual novelty detection benchmark finds that methods that rank best on popular area-under-the-curve metrics may be outperformed by simpler alternatives when controlling false positives matters most. Further multi-modal novelty detection experiments suggest that methods that fuse both visual and symbolic information can improve time until detection as well as overall discrimination. Finally, our evaluation of recent generalized category discovery methods suggests that adapting to new imbalanced categories in complex scenes remains an exciting open problem.
翻译:为使人工剂在不断变化的环境中成功完成任务,他们必须能够探测和适应新事物。然而,视觉新颖的探测研究往往只评价原打算用于目标分类的重用数据集,如CIFAR-10,其图像侧重于一个截然不同的、以良好为中心的对象。需要新的基准来代表在开放世界复杂的场景中浏览复杂场景的挑战。我们的新新新新新新新新发现数据集包含图像的多式偶发数据,以及一个在经过修改的地雷工艺环境中完成松树枝组装任务的代理人所看到的象征性世界状态。在有些情况中,我们在复杂的三维场景中插入新的物体,这些物体可能会影响游戏,并出现在各种大小和位置上。我们的视觉新颖的探测基准发现,在控制假阳性时,最简单的替代方法可能超过这些方法。进一步的多式新式新发现实验表明,将视觉和象征性信息结合到探测之前的时间以及总体歧视。最后,我们对最近普遍化的类别发现方法的评估表明,在复杂的场景中,适应新的不平衡的场面仍是一个令人兴奋的问题。</s>