We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.
翻译:我们认为,部分可见的Markov决策程序(POMDPs)可以模拟需要提供某种资源(例如电池中储存的电力)才能正确运作的代理商。该资源被代理商的行动消耗,只能在某些州得到补充。该代理商的目的是在防止资源耗竭的同时尽可能降低达到某种目标的预期成本,而防止资源耗竭,我们称之为“资源受资源限制的目标优化”的问题。我们对RSGO问题采取了分两步走的办法。首先,使用正规方法技术,我们设计了一种算法,计算出某种特定情景:一种观察该代理商的程序,防止其使用最终可能导致资源耗竭的行动。第二,我们用防护罩加强POMCP规划POMDP的超速搜索算法,以获得解决RSGO问题的算法。我们实施了我们的算法,并提出了实验,表明其适用于文献基准。