We consider a learning agent in a partially observable environment, with which the agent has never interacted before, and about which it learns both what it can observe and how its actions affect the environment. The agent can learn about this domain from experience gathered by taking actions in the domain and observing their results. We present learning algorithms capable of learning as much as possible (in a well-defined sense) both about what is directly observable and about what actions do in the domain, given the learner's observational constraints. We differentiate the level of domain knowledge attained by each algorithm, and characterize the type of observations required to reach it. The algorithms use dynamic epistemic logic (DEL) to represent the learned domain information symbolically. Our work continues that of Bolander and Gierasimczuk (2015), which developed DEL-based learning algorithms based to learn domain information in fully observable domains.
翻译:我们认为,在部分可观测的环境中,该代理人从未与之互动过,并了解它能够观测到什么,以及它的行动如何影响环境。该代理人可以从通过在域内采取行动并观察其结果所积累的经验中学习到这个领域。我们提出了学习算法,能够尽可能多地(以明确界定的意义上)了解哪些是直接可观测的,哪些行动是在域内进行的,鉴于该学习者的观察限制。我们区分了每种算法所达到的域知识水平,并确定了达到该算法所需的观测类型。算法使用动态缩写逻辑(DEL)象征性地代表了所学域信息。我们的工作延续了Bolander和Giersimczuk(2015年)的工作,后者开发了基于DEL的学习算法,以在完全可观测域内学习域信息。