Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect.
翻译:近期强化学习工作侧重于超越最大奖赏的学习政策的若干特点,这些特性包括公平、解释性、概括性和稳健性。在本文中,我们定义了干预性强度(IR),这是衡量通过培训程序附带方面,例如培训数据顺序或代理人采取的特殊探索行动,对学习性政策作出多少变异性的衡量标准。培训程序在代理人在干预下采取非常相似的行动时具有很高的IR,尽管培训程序的这些附带方面各有不同。我们制定了IR的直观定量计量标准,并计算出在三个阿塔里环境中的八种算法,横跨数十个干预和州。我们从这些实验中发现IR与培训数量和算法类型不同,而且高性业绩并不意味着高IR,这是人们可能期望的。