In order for artificial agents to perform useful tasks in changing environments, they must be able to both detect and adapt to novelty. However, visual novelty detection research often only evaluates on repurposed datasets such as CIFAR-10 originally intended for object classification. This practice restricts novelties to well-framed images of distinct object types. We suggest that new benchmarks are needed to represent the challenges of navigating an open world. Our new NovelCraft dataset contains multi-modal episodic data of the images and symbolic world-states seen by an agent completing a pogo-stick assembly task within a video game world. In some episodes, we insert novel objects that can impact gameplay. Novelty can vary in size, position, and occlusion within complex scenes. We benchmark state-of-the-art novelty detection and generalized category discovery models with a focus on comprehensive evaluation. Results suggest an opportunity for future research: models aware of task-specific costs of different types of mistakes could more effectively detect and adapt to novelty in open worlds.
翻译:为了让人造物剂在不断变化的环境中执行有益的任务,他们必须能够探测和适应新事物。然而,视觉新颖的探测研究往往只评价原打算用于物体分类的重新用途数据集,如CIFAR-10。这种做法将新颖性限于不同对象类型的周密图像。我们建议需要新的基准来代表探索开放世界的挑战。我们的新新新新新新新Craft数据集包含多模式的图像和象征世界状态数据,由完成电子游戏世界中高竿组装任务的代理人所看到的图像和象征性世界状态的数据。在某些情况下,我们插入能够影响游戏游戏的新对象。新颖性可以在复杂的场景中变化大小、位置和隐蔽性。我们以综合评价为重点,对最新新颖的探测和通用分类发现模型进行基准。结果显示未来研究的机会:了解不同类型错误特定任务成本的模式可以更有效地探测和适应开放世界的新颖性。