We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.
翻译:我们提出了一个框架,用于利用因果影响图分析代理人的奖励措施。我们确定一个众所周知的信息价值标准是完整的。我们提出了一个新的控制价值图形标准,以确立其可靠性和完整性。我们还引入了两个新的激励分析概念:反应激励措施指出环境的哪些变化影响最佳决定,而工具控制激励措施则确定代理人能否通过变数X影响其效用。对于这两个新概念,我们提供了健全和完整的图形标准。我们通过举例说明这些结果如何帮助评估AI系统的安全性和公正性。