游泳:移动控制任务的一般目的、高性能和高效活动功能</s> (Swim: A General-Purpose, High-Performing, and Efficient Activation Function for Locomotion Control Tasks)

Activation functions play a significant role in the performance of deep learning algorithms. In particular, the Swish activation function tends to outperform ReLU on deeper models, including deep reinforcement learning models, across challenging tasks. Despite this progress, ReLU is the preferred function partly because it is more efficient than Swish. Furthermore, in contrast to the fields of computer vision and natural language processing, the deep reinforcement learning and robotics domains have seen less inclination to adopt new activation functions, such as Swish, and instead continue to use more traditional functions, like ReLU. To tackle those issues, we propose Swim, a general-purpose, efficient, and high-performing alternative to Swish, and then provide an analysis of its properties as well as an explanation for its high-performance relative to Swish, in terms of both reward-achievement and efficiency. We focus on testing Swim on MuJoCo's locomotion continuous control tasks since they exhibit more complex dynamics and would therefore benefit most from a high-performing and efficient activation function. We also use the TD3 algorithm in conjunction with Swim and explain this choice in the context of the robot locomotion domain. We then conclude that Swim is a state-of-the-art activation function for continuous control locomotion tasks and recommend using it with TD3 as a working framework.

翻译：激励功能在深层次学习算法的运行中起着重要作用。特别是, 激励功能在具有挑战性的任务中往往在更深层次的模型(包括更深的强化学习模型)上优于ReLU, 包括更深的强化学习模型。尽管取得了这一进展,但RELU是首选功能,部分因为它比Swish效率更高。此外,与计算机愿景和自然语言处理领域相比,深强化学习和机器人领域不太倾向于采用新的激活功能,如Swish, 而不是继续使用更传统的功能,如RELU。为了解决这些问题,我们提议Swim, 一种通用的、高效的和高性能的替代Swish, 然后再提供其属性分析,并解释其相对于Swish的高度表现, 因为它比Swish更有效率。我们侧重于测试Mujoco的移动控制任务,因为这些任务具有更复杂的动态,因此将最有利于高性能和高效的激活功能。我们还在Swim 与Swim一起使用TD3算算法, 并解释在不断推进机器人移动域域的工作任务中选择。</s>

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。