Describing systems in terms of choices and their resulting costs and rewards offers the promise of freeing algorithm designers and programmers from specifying how those choices should be made; in implementations, the choices can be realized by optimization techniques and, increasingly, by machine-learning methods. We study this approach from a programming-language perspective. We define two small languages that support decision-making abstractions: one with choices and rewards, and the other additionally with probabilities. We give both operational and denotational semantics. In the case of the second language we consider three denotational semantics, with varying degrees of correlation between possible program values and expected rewards. The operational semantics combine the usual semantics of standard constructs with optimization over spaces of possible execution strategies. The denotational semantics, which are compositional, rely on the selection monad, to handle choice, augmented with an auxiliary monad to handle other effects, such as rewards or probability. We establish adequacy theorems that the two semantics coincide in all cases. We also prove full abstraction at base types, with varying notions of observation in the probabilistic case corresponding to the various degrees of correlation. We present axioms for choice combined with rewards and probability, establishing completeness at base types for the case of rewards without probability.
翻译:描述系统的选择、成本和回报,为算法设计人员和程序员提供了一种从规定如何进行选择的束缚中解脱出来的方法;在实现过程中,选择可以通过优化技术和日益普及的机器学习方法来实现。本文从编程语言的角度研究了这种方法。 我们定义了两种支持决策抽象的小语言:一个包含选择和奖励,另一个包括概率。我们给出了操作性和表示性语意。在第二种语言的情况下,我们考虑了三种不同程度相关的可能程序值和预期回报之间的表示性语意。操作语义将标准构造的语义与可能实现策略空间的优化相结合。表示性语义是组合的,依赖于选择 monad(用于处理选择)和一个辅助 monad(用于处理其他效果,例如奖励或概率)。我们建立了充分性定理,即两个语义在所有情况下都相符。我们还证明了在基本类型中完全抽象的情况下与观察的不同程度的相关性相应的概率。我们提出了选择与奖励和概率相结合的公理,在奖励没有概率的情况下,确认了基本类型的完整性。