Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.
翻译:代理商的因果模型被用来分析机器学习系统的安全方面。但识别代理商是非三重的 -- -- 往往是由建模者在没有太多理由的情况下假设的因果模型 -- -- 建模失败可能导致安全分析中的错误。本文提出了第一个正式的代理商因果定义,大致上,代理商是在其行为以不同方式影响世界的情况下调整其政策的系统。我们从中从经验数据中得出第一个发现代理商的因果发现算法,并给出因果模型和游戏理论影响图之间的转换算法。我们通过解决先前由于代理商的不正确的因果建模造成的一些混乱来展示我们的方法。