This paper proposes a novel fuzzy action selection method to leverage human knowledge in reinforcement learning problems. Based on the estimates of the most current action-state values, the proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step. A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior throughout the learning process. This parameter resembles the role of the temperature parameter in the softmax action selection policy, but its tuning process can be more knowledge-oriented since this parameter reflects the human knowledge into the learning agent by making modifications in the fuzzy rule base. Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate, and provides superior performance.
翻译:本文提出了一个新的模糊行动选择方法, 以利用人类知识来强化学习问题。 根据对当前行动状态值的估计, 拟议的模糊非线性绘图代表每个行动成员在下一个步骤中被选择的可能性。 引入了一个用户可调试参数来控制行动选择政策, 以决定该代理人在整个学习过程中的贪婪行为。 这个参数类似于温度参数在软体动作选择政策中的作用, 但其调控过程可以更加面向知识, 因为这个参数通过修改模糊规则基础, 将人类知识反映到学习媒介中。 模拟结果显示, 以拟议的方式将模糊逻辑纳入强化学习中可以提高学习算法的趋同率, 并提供更优的性能 。