In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by developing the notions of information states and approximate information states. In our definition of an information state, we introduce conditions to identify for an uncertain variable sufficient to construct a dynamic program (DP) that computes an optimal strategy. We show that many information states from the literature on worst-case control actions, e.g., the conditional range, are examples of our more general definition. Next, we relax these conditions to define approximate information states using only output variables, which can be learned from output data without knowledge of system dynamics. We use this notion to formulate an approximate DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.
翻译:在本文中,我们调查了部分观察状态的不确定系统中的离散时间决策问题。我们考虑的是非随机模型,在这个模型中,系统上不受控制的干扰行为以未知分布的模组为单位,以未知分布的模组为单位。我们通过发展信息状态和近似信息状态的概念,为这类问题的决策提供了一个总体框架。在信息状态定义中,我们引入了条件,以便为一个不确定的变量确定一个足以计算最佳战略的动态程序(DP)。我们表明,许多资料都来自最坏情况控制行动的文献,例如有条件的范围,是我们更一般定义的例子。接下来,我们放松这些条件,以便用输出变量来定义近似的信息状态,仅使用产出变量,而这种变量可以在没有系统动态知识的情况下从产出数据中学习。我们用这个概念来设计出一个大致的DP,产生一个具有约束性绩效损失的战略。最后,我们用数字例子来说明我们在控制和强化学习方面运用我们的成果。