We present the first prize solution to NeurIPS 2021 - AWS Deepracer Challenge. In this competition, the task was to train a reinforcement learning agent (i.e. an autonomous car), that learns to drive by interacting with its environment, a simulated track, by taking an action in a given state to maximize the expected reward. This model was then tested on a real-world track with a miniature AWS Deepracer car. Our goal is to train a model that can complete a lap as fast as possible without going off the track. The Deepracer challenge is a part of a series of embodied intelligence competitions in the field of autonomous vehicles, called The AI Driving Olympics (AI-DO). The overall objective of the AI-DO is to provide accessible mechanisms for benchmarking progress in autonomy applied to the task of autonomous driving. The tricky section of this challenge was the sim2real transfer of the learned skills. To reduce the domain gap in the observation space we did a canny edge detection in addition to cropping out of the unnecessary background information. We modeled the problem as a behavioral cloning task and used MLP-MIXER to optimize for runtime. We made sure our model was capable of handling control noise by careful filtration of the training data and that gave us a robust model capable of completing the track even when 50% of the commands were randomly changed. The overall runtime of the model was only 2-3ms on a modern CPU.
翻译:我们向NeurIPS 2021 - AWS 深藏色挑战展示了第一个奖项解决方案。 在这场竞赛中,我们的任务是训练一个强化学习剂(即自主汽车),通过在特定状态采取行动最大限度地获得预期的奖赏,学习如何与环境互动驱动一个模拟轨道。这个模型随后在现实轨道上用微型AWS 深藏色汽车进行测试。我们的目标是训练一个模型,能够在不偏离轨道的情况下尽可能快地完成一圈。深藏色挑战是自发车辆领域一系列包含的情报竞赛的一部分,称为AI驾驶奥林匹克。AI-DO的总目标是提供方便的机制,用以对自主驾驶任务所应用的自主性进展进行衡量。这个挑战的棘手部分是:将所学到的技能进行模拟和真实的转让。为了缩小观测空间的域间差距,我们除了随机的背景资料外,还做了一个随机的边缘探测。我们把问题模拟成一个行为克隆模型,并使用了50-MP-MIXER系统(AI-DO)系统(AI-DO)系统的总目标,在运行时,一个精密的C-MLP-MIXS-ML-ML-Risal KADMDMDA是我们进行最优化的数据管理时,我们最稳地完成了数据。