翻译标题：强化学习与无监督学习的融合控制在神经网络的突触可塑性中的应用翻译摘要：大脑可以通过学习快速高效地执行各种任务。然而，使我们学习的机制大多数不清楚或是非常复杂的。近年来，认知神经科学和人工智能领域做了大量的努力来理解和模拟大脑惊人的学习能力的机制和结构。然而，在目前认知神经科学的理解中，人们广泛认为突触可塑性在学习能力中扮演着重要的角色。这种机制也被称为学分分配问题 (Credit Assignment Problem, CAP)，是神经科学和人工智能中的一项基本挑战。神经科学家的观察明确证实了误差反馈系统和无监督学习这两个重要机制在突触可塑性中的作用。因此，本文提出了一种新的学习规则，即通过强化学习 (RL) 和无监督学习 (UL) 的融合，控制突触可塑性。在所提出的计算模型中，利用非线性最优控制理论模拟误差反馈回路系统，将输出误差投影到神经元膜电位 (neurons state) 上，利用基于神经元膜电位或神经元活动的无监督学习规则来模拟突触可塑性动力学，以确保输出误差最小化。 (Control of synaptic plasticity via the fusion of reinforcement learning and unsupervised learning in neural networks)

翻译：翻译标题：强化学习与无监督学习的融合控制在神经网络的突触可塑性中的应用翻译摘要：大脑可以通过学习快速高效地执行各种任务。然而，使我们学习的机制大多数不清楚或是非常复杂的。近年来，认知神经科学和人工智能领域做了大量的努力来理解和模拟大脑惊人的学习能力的机制和结构。然而，在目前认知神经科学的理解中，人们广泛认为突触可塑性在学习能力中扮演着重要的角色。这种机制也被称为学分分配问题 (Credit Assignment Problem, CAP)，是神经科学和人工智能中的一项基本挑战。神经科学家的观察明确证实了误差反馈系统和无监督学习这两个重要机制在突触可塑性中的作用。因此，本文提出了一种新的学习规则，即通过强化学习 (RL) 和无监督学习 (UL) 的融合，控制突触可塑性。在所提出的计算模型中，利用非线性最优控制理论模拟误差反馈回路系统，将输出误差投影到神经元膜电位 (neurons state) 上，利用基于神经元膜电位或神经元活动的无监督学习规则来模拟突触可塑性动力学，以确保输出误差最小化。

Mohammad Modiri

from arxiv, Draft version. arXiv admin note: substantial text overlap with arXiv:2303.07273

The brain can learn to execute a wide variety of tasks quickly and efficiently. Nevertheless, most of the mechanisms that enable us to learn are unclear or incredibly complicated. Recently, considerable efforts have been made in neuroscience and artificial intelligence to understand and model the structure and mechanisms behind the amazing learning capability of the brain. However, in the current understanding of cognitive neuroscience, it is widely accepted that synaptic plasticity plays an essential role in our amazing learning capability. This mechanism is also known as the Credit Assignment Problem (CAP) and is a fundamental challenge in neuroscience and Artificial Intelligence (AI). The observations of neuroscientists clearly confirm the role of two important mechanisms including the error feedback system and unsupervised learning in synaptic plasticity. With this inspiration, a new learning rule is proposed via the fusion of reinforcement learning (RL) and unsupervised learning (UL). In the proposed computational model, the nonlinear optimal control theory is used to resemble the error feedback loop systems and project the output error to neurons membrane potential (neurons state), and an unsupervised learning rule based on neurons membrane potential or neurons activity are utilized to simulate synaptic plasticity dynamics to ensure that the output error is minimized.

翻译：