Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology.
翻译:暂无翻译