High level declarative constraints provide a powerful (and popular) way to define and construct control policies; however, most synthesis algorithms do not support specifying the degree of randomness (unpredictability) of the resulting controller. In many contexts, e.g., patrolling, testing, behavior prediction, and planning on idealized models, predictable or biased controllers are undesirable. To address these concerns, we introduce the \emph{Entropic Reactive Control Improvisation} (ERCI) framework and algorithm that supports synthesizing control policies for stochastic games that are declaratively specified by (i) a \emph{hard constraint} specifying what must occur (ii) a \emph{soft constraint} specifying what typically occurs, and (iii) a \emph{randomization constraint} specifying the unpredictability and variety of the controller, as quantified using causal entropy. This framework, which extends the state-of-the-art by supporting arbitrary combinations of adversarial and probabilistic uncertainty in the environment, enables a flexible modeling formalism which we argue, theoretically and empirically, remains tractable.
翻译:高层次的宣示性制约为界定和构建控制政策提供了一种强大的(和流行的)方法; 但是, 大多数合成算法并不支持具体说明由此产生的控制器的随机性(不可预测性)的程度。 在许多情况下, 例如巡逻、 测试、 行为预测、 理想化模型的规划、 可预测或偏颇的控制器是不可取的。 为了解决这些问题, 我们引入了 emph{ Entropic Reactive Reactive Controlization} (ERCI) (ERCI) 框架和算法, 支持以( 一) \emph{ 硬性制约} 来说明必须发生的情况 (二) \ emph{ 软性制约} 来说明通常发生的情况, 以及 (三) \emph{ 随机化制约} 来说明控制器的不可预测性和多样性, 并用因果诱因的念来量化。 这个框架通过支持环境上的对抗性和概率不确定性的任意组合来扩展状态, 使得一种灵活的建模主义, 我们从理论上和实验上说, 仍然可以辨称, 。