This paper formally models the strategic repeated interactions between a system, comprising of a machine learning (ML) model and associated explanation method, and an end-user who is seeking a prediction/label and its explanation for a query/input, by means of game theory. In this game, a malicious end-user must strategically decide when to stop querying and attempt to compromise the system, while the system must strategically decide how much information (in the form of noisy explanations) it should share with the end-user and when to stop sharing, all without knowing the type (honest/malicious) of the end-user. This paper formally models this trade-off using a continuous-time stochastic Signaling game framework and characterizes the Markov perfect equilibrium state within such a framework.
翻译:本文正式模拟了由机器学习模式和相关解释方法组成的系统与通过游戏理论寻求预测/标签及其查询/输入解释的最终用户之间的战略重复互动。 在这个游戏中,恶意最终用户必须从战略上决定何时停止查询和试图破坏系统,而系统必须从战略上决定它应该与最终用户分享多少信息(以吵闹解释的形式),以及何时停止分享,而不知道最终用户的类型(诚实/恶意),本文正式用连续时间的随机信号游戏框架来模拟这种交易,并描述马尔科夫在这种框架内的完美平衡状态。