Transparency and fairness issues stem from the black-box nature of deep neural networks (DNN). They are relevant to Deep Reinforcement Learning which also use DNN to learn its policy, value functions etc. This paper proposes a way to circumvent the issues through the bottom-up design of neural networks (NN) with detailed interpretability, where each neuron or layer has its own meaning and utility that corresponds to humanly understandable concept. With deliberate design, we show that lavaland problems can be solved using NN model with few parameters. Furthermore, we introduce the Self Reward Design (SRD), inspired by the Inverse Reward Design, so that our interpretable design can (1) solve the problem by pure design (although imperfectly) (2) be optimized via SRD (3) perform avoidance of unknown states by recognizing the inactivations of neurons aggregated as the activation in \(w_{unknown}\).
翻译:透明度和公平问题源于深神经网络(DNN)的黑箱性质。 这些问题与深神经网络(DNN)的深强化学习相关,后者也使用DNN来学习其政策、价值功能等。 本文提出一种方法,通过自下而上设计具有详细解释性的神经网络(NN)来回避问题,因为每个神经网络或层都有与人可以理解的概念相对应的自身意义和实用性。 通过深思熟虑的设计,我们显示熔岩问题可以用NNN模型(没有多少参数)解决。 此外,我们引入了由反反逆反反向设计启发的自失能设计(SRD),以便我们可解释的设计(1) 通过纯设计(尽管不完美)(2) 通过SRD(3) 实现最佳地避免未知状态,通过确认在\(wunnownnn)中作为激活的集合神经元的不活跃状态。