Blackwell's approachability is a very general sequential decision framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given "target" set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in online learning and game theory, in particular. We extend this framework by allowing the outcome function and the dot product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent dot products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to the Big Match (one of the most important toy examples of stochastic games) where an $\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's algorithm in a well-chosen auxiliary approachability problem.
翻译:Blackwell的可选性是一个非常笼统的顺序决定框架,决策者在其中获得矢量估值结果,目的是将平均结果与给定的“目标”组合相融合。 Blackwell为决策者提供了一个充分的条件,以便他们拥有一项战略,保证这种对敌对环境的趋同,以及我们现在称之为Blackwell的算法,从而保证了趋同。Blackwell的可选性后来适用于许多问题,特别是在线学习和游戏理论中的许多问题。我们通过允许结果函数和点产品具有时间依赖性来扩展这一框架。我们为Blackwell的算法框架的自然扩展确立了一个总体保证。在设定目标为“一个”的情况下,我们提出了一组取决于时间的点产品,为平均结果的每个协调产生不同的趋同速度。我们将这一框架应用于大匹配(这是最重要的托科游戏中最重要的一个实例 ) 。 我们让Blackwell的算法在选择良好的辅助性问题中为玩家I提出了一个$的统一最佳策略。</s>