Regret minimization is a key component of many algorithms for finding Nash equilibria in imperfect-information games. To scale to games that cannot fit in memory, we can use search with value functions. However, calling the value functions repeatedly in search can be expensive. Therefore, it is desirable to minimize regret in the search tree as fast as possible. We propose to accelerate the regret minimization by introducing a general ``learning not to regret'' framework, where we meta-learn the regret minimizer. The resulting algorithm is guaranteed to minimize regret in arbitrary settings and is (meta)-learned to converge fast on a selected distribution of games. Our experiments show that meta-learned algorithms converge substantially faster than prior regret minimization algorithms.
翻译:遗憾最小化是许多算法中找到不完善信息游戏中 Nash 平衡的关键组成部分。 对于缩小到无法与记忆相匹配的游戏, 我们可以使用价值函数搜索。 但是, 在搜索中反复调用值函数可能很昂贵。 因此, 最好尽可能快地在搜索树上将遗憾最小化最小化。 我们建议通过引入“ 学习不要后悔” 的框架来加快遗憾最小化的最小化。 由此产生的算法可以保证在任意设置中最大限度地减少遗憾, 并且( 元) 学会快速汇集到选择的游戏分布上。 我们的实验显示, 元学算法比先前的遗憾最小化算法要快得多。</s>