与深强化学习中的动态微缩训练自动过滤 (Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning)

Tomorrow's robots will need to distinguish useful information from noise when performing different tasks. A household robot for instance may continuously receive a plethora of information about the home, but needs to focus on just a small subset to successfully execute its current chore. Filtering distracting inputs that contain irrelevant data has received little attention in the reinforcement learning literature. To start resolving this, we formulate a problem setting in reinforcement learning called the $\textit{extremely noisy environment}$ (ENE), where up to $99\%$ of the input features are pure noise. Agents need to detect which features provide task-relevant information about the state of the environment. Consequently, we propose a new method termed $\textit{Automatic Noise Filtering}$ (ANF), which uses the principles of dynamic sparse training in synergy with various deep reinforcement learning algorithms. The sparse input layer learns to focus its connectivity on task-relevant features, such that ANF-SAC and ANF-TD3 outperform standard SAC and TD3 by a large margin, while using up to $95\%$ fewer weights. Furthermore, we devise a transfer learning setting for ENEs, by permuting all features of the environment after 1M timesteps to simulate the fact that other information sources can become relevant as the world evolves. Again, ANF surpasses the baselines in final performance and sample complexity. Our code is available at https://github.com/bramgrooten/automatic-noise-filtering

翻译：明天的机器人需要将有用的信息与执行不同任务时的噪音区分开来。比如, 家庭机器人可能不断收到大量有关家庭的信息, 但只需要关注一个小子集就可以成功完成目前的工作。在强化学习文献中, 包含不相关数据的过滤分散性输入没有引起什么注意。为了开始解决这个问题, 我们设计了一个强化学习中的问题设置, 名为$\ textit{ 极端吵闹的环境} (ENE), 其中输入特性中高达99 美元是纯噪音。代理人需要检测哪些功能提供与任务相关的环境状况信息。因此, 我们提议了一个名为 $\ textit{ 自动噪音过滤} (ANF) 的新方法, 这个方法使用动态零星培训的原则与各种深度强化学习算法的协同作用。稀薄的输入层学会将其连结重点放在任务相关特性上, 例如, ANF- SAC 和 ANF-TD3 超越了标准 SAC 和TD3 3, 以大幅度, 同时使用最多95 $ 美元有关环境状况的重量。因此, 我们设计了一个不透明性 IMFI 数据系统, 的运行中的所有运行数据源。