Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk. In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems. We make a number of contributions towards this goal, with a focus on model-based algorithms...
翻译:当前被自动化的许多顺序决策问题,例如制造业或推荐系统,操作在一个要么有很少不确定性,要么没有灾难风险的环境中。随着公司和研究人员试图在更不受限制的环境中部署自主系统,我们越来越需要为顺序决策算法赋予关于不确定性和风险的推理能力。在本论文中,我们将解决顺序决策制定和强化学习两种方法。在制定设置中,假定提供环境模型,并在该模型内优化策略。强化学习依赖于广泛的随机探索,因此通常需要一个模拟器来进行培训。在许多现实世界的域中,无法构建一个完全准确的模型或模拟器。因此,由于对环境的不完全知识,任何策略的表现不可避免地存在不确定性。此外,在随机域中,由于环境的固有随机性,任何给定运行的结果也是不确定的。这两个不确定性来源通常被归类为认识性和aleatoric不确定性。本论文的总体目标是为减轻顺序决策问题中这两个不确定性来源做出贡献。我们对此目标做出了一些贡献,重点是基于模型的算法...