We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $\alpha$-homogeneous rewards. We show that its empirical process, which can be written as a stochastic approximation recursion with Markov noise, has the same probability law as a certain vertex reinforced random walk. Thus the limiting differential equation that it tracks coincides with the forward Kolmogorov equation for the latter, which in turn is a scaled version of a special instance of replicator dynamics with potential. We use this equivalence to show that for $\alpha > 0$, the asymptotic outcome concentrates around the optimum in a certain limiting sense when `annealed' by letting $\alpha\uparrow\infty$ slowly.
翻译:我们引入了受图形限制的动态选择模型, 其强化模型以正负的 $alpha$- 均匀奖励为模型。 我们展示了它的经验过程, 可以写成为与Markov噪音相重复的悬浮近似, 其概率法与某种脊椎强化随机行走法相同。 因此, 它所跟踪的限值差异方程式与后者的远端 Kolmogorov 方程式相吻合, 而后者反过来又是一个具有潜力的反射剂动态特殊实例的缩放版。 我们用这个等值来显示, 对于$alpha > 0$, 当“ 静态” 缓慢地放出$alpha\uprow\ infty $来“ 静态” 时, 静态结果在一定的有限意义上集中围绕最佳效果。