Our world is full of asymmetries. Gravity and wind can make reaching a place easier than coming back. Social artifacts such as genealogy charts and citation graphs are inherently directed. In reinforcement learning and control, optimal goal-reaching strategies are rarely reversible (symmetrical). Distance functions supported on these asymmetrical structures are called quasimetrics. Despite their common appearance, little research has been done on the learning of quasimetrics. Our theoretical analysis reveals that a common class of learning algorithms, including unconstrained multilayer perceptrons (MLPs), provably fails to learn a quasimetric consistent with training data. In contrast, our proposed Poisson Quasimetric Embedding (PQE) is the first quasimetric learning formulation that both is learnable with gradient-based optimization and enjoys strong performance guarantees. Experiments on random graphs, social graphs, and offline Q-learning demonstrate its effectiveness over many common baselines.
翻译:我们的世界充满了不对称。 重力和风能比返回更容易到达一个地方。 诸如基因图表和引言图等社会工艺品具有内在方向。 在加强学习和控制方面,最佳的目标影响战略很少可以逆转(对称 ) 。 这些不对称结构所支持的距离功能被称为准度函数。 尽管这些功能的外观很常见,但对准度的学习研究却很少。 我们的理论分析显示,一种共同的学习算法,包括不受限制的多层透视器(MLPs),可能无法学习与培训数据相符的准度。 相比之下,我们提议的Poisson 准度嵌入式(PQE) 是第一个准度学习公式,既可以通过梯度优化学习,又享有很强的性能保障。 随机图、社会图和离线的Q学习实验证明了它在许多共同基线上的有效性。