How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively decides which simulator to use for every simulation, based on a statistic that measures the accuracy of the approximate simulator. This allows us to use the approximate simulator to replace the original simulator for faster simulations when it is accurate enough under the current context, thus trading off simulation speed and accuracy. Experimental results in two large domains show that when integrated with POMCP, our approach allows to plan with improving efficiency over time.
翻译:当时间预算有限时,我们如何在大而复杂的环境中有效规划?鉴于最初的环境模拟器,其计算要求可能很高,我们提议在网上学习一个近似但更快的模拟器,随着时间的推移不断改进。在大约模拟器正在学习的同时,为了可靠和高效地规划,我们开发了一种方法,根据测量近似模拟器准确性的统计数据,在每次模拟中使用哪种模拟器时,以适应性的方式决定该模拟器。这使我们能够使用近似模拟器取代原始模拟器,以更快地进行模拟,而在当前情况下它足够准确,从而交换模拟速度和准确性。两个大领域的实验结果显示,与POMCP相结合时,我们的方法允许在与POMCP相结合时,以提高效率的方式进行规划。