Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fully-connected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights.
翻译:人类文化依赖于创新:我们有能力不断探索如何将现有要素结合起来创造新的要素。创新不是孤立的,而是依赖集体搜索和积累。强化学习(RL)方法通常认为完全关联的团体最适合创新。然而,人类实验室和实地研究表明,等级创新通过动态的社会网络结构更有力地实现。在动态环境中,个体或小型集群创新之间,人类潜伏在一起,然后与他人分享成果。根据我们的知识,社会网络结构在创新中的作用没有在RL系统研究。在这里,我们使用多层次问题设置(WordCraft),有三个不同的创新任务来测试社会网络结构影响分布式RL算法绩效的假设。我们系统地设计DQN的网络,在不同结构(紧密关联、小世界、动态、环)中共享缓冲的经验,并引入一系列行为和恩度衡量标准,以扩展RL的经典奖赏评价框架。在这里,我们使用多层次的问题设置(WordCraft),用三种不同的创新任务来测试社会网络结构所实现的创新水平,在不同的社会网络结构内部,一个稳定的搜索结构中,一个连续地显示,在不同的层次上,一个层次创新任务中,一个层次上,一个我们可以理解一个层次上,一个层次上,一个层次上,一个层次上,一个稳定的搜索层次上,一个层次上,一个我们可以理解一个层次的层次上,一个层次上,一个层次上,一个层次上,一个层次的探索一个层次的层次的层次上,一个层次上,一个层次上,一个层次的层次上,一个层次上,一个层次上,一个层次上,一个层次上,一个层次上,一个层次的层次的层次的层次的层次的层次上的任务,可以理解一个层次的层次的层次的层次的层次的层次上,一个层次上,一个层次上,一个层次上,一个层次上的工作,可以显示一个层次的层次上,一个层次上的工作,一个层次上的工作,可以显示一个层次上,一个层次上,一个层次上,一个层次上的工作,一个层次上,一个层次上的工作,一个层次上,一个层次上,一个层次上,可以显示一个层次上,一个层次上,一个层次上,一个层次上,一个层次上,一个层次上,一个层次上的工作,一个层次上,一个层次上的工作,一个层次上,一个层次上,一个层次上