Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biological intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in deep reinforcement learning. Here, we investigate deep reinforcement learning agents in an ecological patch foraging task. For the first time, we show that machine learning agents can learn to patch forage adaptively in patterns similar to biological foragers, and approach optimal patch foraging behavior when accounting for temporal discounting. Finally, we show emergent internal dynamics in these agents that resemble single-cell recordings from foraging non-human primates, which complements experimental and theoretical work on the neural mechanisms of biological foraging. This work suggests that agents interacting in complex environments with ecologically valid pressures arrive at common solutions, suggesting the emergence of foundational computations behind adaptive, intelligent behavior in both biological and artificial agents.
翻译:补丁觅食是生物学中最研究深入的行为优化问题之一。然而,尽管在生物智能方面至关重要,但在人工智能研究中,这个行为优化问题还不够研究。鉴于其具有已知的最优解方案,在当前深度强化学习技术下,可能难以发现,因此我们研究生态补丁觅食任务中的深度强化学习智能体。我们首次展示机器学习智能体可以像生物觅食者一样自适应地学习补丁觅食,且在考虑时间贴现时可以实现接近最优的补丁觅食行为。最后,我们展示这些智能体中的出现性内部动态类似于非人灵长类动物觅食记录,这与生物觅食的实验和理论工作相辅相成。这项工作表明,交互于具有生态有效压力的复杂环境中的智能体会得出共同的解决方案,表明了生物和人工智能智能行为背后自适应基本计算的出现。