Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large scale Imperfect Information Games (IIGs). Neural CFR is one of the promising techniques that can effectively reduce the computation and memory consumption of CFR by generalizing decision information between similar states. However, current neural CFR algorithms have to approximate the cumulative variables in iterations with neural networks, which usually results in large estimation variance given the huge complexity of IIGs. Moreover, model-based sampling and inefficient training make current neural CFR algorithms still computationally expensive. In this paper, a new model-free neural CFR algorithm with bootstrap learning is proposed, in which, a Recursive Substitute Value (RSV) network is trained to replace the cumulative variables in CFR. The RSV is defined recursively and can be estimated independently in every iteration using bootstrapping. Then there is no need to track or approximate the cumulative variables any more. Based on the RSV, the new neural CFR algorithm is model-free and has higher training efficiency. Experimental results show that the new algorithm can match the state-of-the-art neural CFR algorithms and with less training cost.
翻译:在解决大规模低效信息游戏(IIGs)方面,反事实最小化(CFR)已经取得了许多令人着迷的成果。神经CFR(CFR)是一个很有希望的技术,通过对类似州之间的决策信息进行普及,可以有效减少CFR的计算和内存消耗。然而,目前的神经CFR算法必须估计神经网络循环中的累积变量,由于IIIGs的巨大复杂性,这通常会导致巨大的估计差异。此外,基于模型的抽样和低效率的培训使得当前的神经CFR算法仍然计算昂贵。在本文中,提出了一个新的无型神经CFR算法,采用靴套学习方法,其中培训了再固化替代值(RSV)网络以取代CFR的累积变量。RSV(RSV)是反复定义的,在每次循环中都可以使用靴子安装进行独立估计。随后没有必要再跟踪或估计累积变量。根据RSV,新的神经FRR算法是无型的,并且具有更高的培训效率。实验结果显示,新的算法可以比C-RRRA更低成本。