In many control problems that include vision, optimal controls can be inferred from the location of the objects in the scene. This information can be represented using feature points, which is a list of spatial locations in learned feature maps of an input image. Previous works show that feature points learned using unsupervised pre-training or human supervision can provide good features for control tasks. In this paper, we show that it is possible to learn efficient feature point representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses. Our proposed architecture consists of a differentiable feature point extractor that feeds the coordinates of the estimated feature points directly to a soft actor-critic agent. The proposed algorithm yields performance competitive to the state-of-the art on DeepMind Control Suite tasks.
翻译:在包括视觉在内的许多控制问题中,从现场物体的位置可以推断出最佳控制。这些信息可以使用特征点来表示,这是用一个输入图像的学习地貌地图中的空间位置清单。以前的作品显示,通过未经监督的培训前或人的监督而学到的特征点可以为控制任务提供良好的特征。在本文中,我们表明,可以学习高效的特征点表示端至端,而不需要未经监督的训练前、解码器或额外损失。我们拟议的结构由不同的特征点提取器组成,将估计地貌点的坐标直接连接到一个软的行为者-critic 代理器。拟议的算法使得“深点控制套件”的最先进的技术具有竞争力。