Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the environment. In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets. Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space. The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator. We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data. We further leverage the temporal information of this data to pair data points as a natural supervision for representation learning. Our experiments suggest that the learned representations can be successfully transferred to downstream tasks in both vision and reinforcement learning domains.
翻译:大型和多样化的数据集是人造智能中许多令人印象深刻进步的基石。 但是,智能生物通过与环境互动学习,从而改变输入感官信号和环境状况。 在这项工作中,我们的目标是将两个世界中最优秀的人带来,并提出一种算法,显示探索行为,同时利用大量多样的数据集。我们的关键想法是利用在静态数据集上预先训练的深层基因化模型,并在潜伏空间引入一个动态模型。过渡动态只是将一个动作和随机抽样潜伏组合在一起。然后,对时间持久性应用指数性移动平均数,由此产生的潜值被用预先训练的生成器解码成图像。我们随后使用一种不受监督的强化算法,在这种环境中进行探索,并对所收集的数据进行不受监督的代言学习。我们进一步利用这些数据的时间信息将数据点配对成一种自然的代言学习监督。我们的实验表明,在视觉和强化学习领域,所学得来的代言可以成功地转换为下游任务。