Hebbian meta-learning has recently shown promise to solve hard reinforcement learning problems, allowing agents to adapt to some degree to changes in the environment. However, because each synapse in these approaches can learn a very specific learning rule, the ability to generalize to very different situations is likely reduced. We hypothesize that limiting the number of Hebbian learning rules through a "genomic bottleneck" can act as a regularizer leading to better generalization across changes to the environment. We test this hypothesis by decoupling the number of Hebbian learning rules from the number of synapses and systematically varying the number of Hebbian learning rules. The results in this paper suggest that simultaneously learning the Hebbian learning rules and their assignment to synapses is a difficult optimization problem, leading to poor performance in the environments tested. However, parallel research to ours finds that it is indeed possible to reduce the number of learning rules by clustering similar rules together. How to best implement a "genomic bottleneck" algorithm is thus an important research direction that warrants further investigation.
翻译:Hebbian 元学习最近显示出解决硬强化学习问题的希望,使代理商能够在一定程度上适应环境的变化。然而,由于这些方法中的每个突触可以学习非常具体的学习规则,因此推广到非常不同的情况的能力可能降低。我们假设通过“基因化瓶颈”限制Hebbian学习规则的数量可以起到常规作用,从而导致在环境变化中更加普遍化。我们通过将Hebbbian学习规则的数目与突触的数目脱钩并系统地区分Hebbian学习规则的数目来测试这一假设。因此,本文的结果表明,同时学习Hebbbian学习规则及其被分配到突触规则是一个困难的优化问题,导致在所测试的环境中业绩不佳。然而,与我们的平行研究发现,通过将类似的规则组合来减少学习规则的数量确实是可能的。因此,如何最好地实施“基因化瓶”算法是一个重要的研究方向,值得进一步调查。