改进政策梯度搜索中的探索:应用到象征性优化 (Improving exploration in policy gradient search: Application to symbolic optimization)

Many machine learning strategies designed to automate mathematical tasks leverage neural networks to search large combinatorial spaces of mathematical symbols. In contrast to traditional evolutionary approaches, using a neural network at the core of the search allows learning higher-level symbolic patterns, providing an informed direction to guide the search. When no labeled data is available, such networks can still be trained using reinforcement learning. However, we demonstrate that this approach can suffer from an early commitment phenomenon and from initialization bias, both of which limit exploration. We present two exploration methods to tackle these issues, building upon ideas of entropy regularization and distribution initialization. We show that these techniques can improve the performance, increase sample efficiency, and lower the complexity of solutions for the task of symbolic regression.

翻译：许多旨在将数学任务自动化的机器学习战略,都旨在利用神经网络搜索数学符号的大型组合空间。与传统的进化方法相反,利用以搜寻核心为核心的神经网络,可以学习更高层次的象征性模式,为搜索提供指导。当没有贴标签的数据时,这类网络仍然可以使用强化学习来接受培训。然而,我们证明这种方法可能受到早期承诺现象和初始化偏差的影响,两者都限制了探索。我们提出了解决这些问题的两种探索方法,其基础是昆虫正规化和分销初始化的理念。我们表明,这些技术可以改进性能,提高样本效率,降低象征性回归任务解决方案的复杂性。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日