Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedented levels. However, solutions of those problems are hectic, ranging from convex and non-convex optimization, and unconstrained optimization to constrained optimization. Here we provide a general dual function formalism that transforms the constrained optimization problem into an unconstrained convex one for any mixture of action and state entropies. The cases with pure action entropy and pure state entropy are understood as limits of the mixture.
翻译:先前的工作分别处理了不同形式的行动、状态和行为状态的整顿、纯粹的探索和空间占用问题,这些问题对于整顿、普遍化、加速学习和提供史无前例的有力解决办法都极为相关,但是,这些问题的解决方法是杂乱的,从细微和无节制的优化到节制的优化。 我们在这里提供了一种普遍的双重功能形式主义,将有限的优化问题转化为任何行动和国家寄生体混合的未受限制的紧凑。 纯化行动酶和纯化状态的昆虫被理解为混合物的极限。