The black-box nature of deep neural networks (DNN) has brought to attention the issues of transparency and fairness. Deep Reinforcement Learning (Deep RL or DRL), which uses DNN to learn its policy, value functions etc, is thus also subject to similar concerns. This paper proposes a way to circumvent the issues through the bottom-up design of neural networks with detailed interpretability, where each neuron or layer has its own meaning and utility that corresponds to humanly understandable concept. The framework introduced in this paper is called the Self Reward Design (SRD), inspired by the Inverse Reward Design, and this interpretable design can (1) solve the problem by pure design (although imperfectly) and (2) be optimized like a standard DNN. With deliberate human designs, we show that some RL problems such as lavaland and MuJoCo can be solved using a model constructed with standard NN components with few parameters. Furthermore, with our fish sale auction example, we demonstrate how SRD is used to address situations that will not make sense if black-box models are used, where humanly-understandable semantic-based decision is required.
翻译:深神经网络(DNN)的黑箱性质引起了人们对透明度和公平问题的注意。深神经网络(DNN)的深度强化学习(Deep RL 或 DRL)使用DNN来学习其政策、价值功能等,因此也引起了类似的关注。本文件提出一种办法,通过从下到上设计具有详细可解释性的神经网络(神经网络)来回避问题,因为每个神经或层都有与人理解的概念相对应的自身意义和实用性。本文提出的框架称为自评设计(SRD),受到反向反向设计启发,而这种可解释的设计可以(1) 纯粹设计(尽管不完美)解决问题,(2) 优化为标准DNNN。我们有意设计,我们表明有些RL问题(如熔岩和MuJoCo)可以通过一个模型来解决,而该模型使用标准的NNNE组件没有多少参数。此外,我们用鱼销售拍卖的例子表明,SRD如何利用SRD来处理如果使用黑箱模型将无法理解的情况,如果使用黑箱模型,需要人们难以理解的以判断的决定。