Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms na\"ive mixing by 50% on average.
翻译:利用许多离线机器人数据源,需要努力应对这些数据的异质性。在本文中,我们关注异质性的一个特殊方面:从不同控制频率收集的离线数据中学习。在实验室中,控制器的离散、传感器取样率和感兴趣任务的需求可能不同,从而在汇总的数据集中产生频率混合。我们研究了离线强化学习算法在培训期间能够容纳不同频率混合的数据有多好。我们观察到,美元值在不同离散率上以不同速度传播,给离线的离线 RL带来一些学习挑战。我们提出了一个简单而有效的解决方案,要求以美元值更新的费率保持一致性,以稳定学习。我们通过以离线规模以美元递增回报,有效地平衡美元值的传播,导致更稳定的融合。关于三个模拟的机器人控制问题,我们从实验中发现,这种简单的方法在平均50%之间超越了自动混合。