Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new testbed, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The testbed consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the testbed, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment. According to our testbed design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. Testbed website: https://github.com/phy-q/novphy
翻译:由于出现了与物理环境互动的人工智能系统,因此人们越来越有兴趣将物理推理能力纳入这些人工智能系统。但是,仅仅具备物理推理能力才能在实际物理环境中运作就足够了。在现实世界中,我们经常面临我们以前没有遇到过的新情况。作为人类,我们有能力成功地适应这些情况。同样,一个代理人需要有能力在新事物的影响下运作,以便在开放世界的物理环境中正常运行。为了便利于这种人工智能系统的发展,我们提议一个新的测试床,NovPhy,这需要代理人在物理假设中解释物理假设,并据此采取行动。在现实世界中,我们经常面临需要代理人检测和适应新事物的新情况。我们开发八个新事物来代表多样化的新事物空间,并将它们应用到物理环境中的五种常见情景中。根据我们的测试设计,我们评估一个代理人的两种能力:在应用不同的物理假设时,在物理假设上的表现会改变,而在物理假设上,如果在不同的自然环境上出现更糟糕的情况时,则需要一个代理人来进行更精确的动作。我们进行彻底的业绩评估,在人类的代理人身上进行这种评估。</s>