Decision making is a fundamental capability of living organisms, and has recently been gaining increasing importance in many engineering applications. Here, we consider a simple decision-making principle to identify an optimal choice in multi-armed bandit (MAB) problems, which is fundamental in the context of reinforcement learning. We demonstrate that the identification mechanism of the method is well described by using a competitive ecosystem model, i.e., the competitive Lotka--Volterra (LV) model. Based on the "winner-take-all" mechanism in the competitive LV model, we demonstrate that non-best choices are eliminated and only the best choice survives; the failure of the non-best choices exponentially decreases while repeating the choice trials. Furthermore, we apply a mean-field approximation to the proposed decision-making method and show that the method has an excellent scalability of $O(\log N)$ with respect to the number of choices $N$. These results allow for a new perspective on optimal search capabilities in competitive systems.
翻译:决策是活生物体的基本能力,最近在许多工程应用中越来越重要。在这里,我们考虑一个简单的决策原则,在多武装土匪(MAB)问题中确定最佳选择,这是加强学习的基础。我们证明,该方法的识别机制通过使用竞争性生态系统模式,即竞争性的Lotka-Volterra(LV)模式来很好地描述。根据竞争性LV模式中的“赢家-全吞”机制,我们证明非最佳选择已被消除,只有最佳选择才能生存下来;非最佳选择的失败在重复选择试验时急剧减少。此外,我们对拟议的决策方法采用平均近似法,并表明该方法在选择数量方面具有极佳的可乘性,即,即,Lotka-Volterra(LV)模式。根据竞争性LV模式中的“赢家-全吞”机制,我们证明,在竞争性系统的最佳搜索能力方面,这些结果为最佳搜索能力提供了新的视角。