We are witnessing significant progress on perception models, specifically those trained on large-scale internet images. However, efficiently generalizing these perception models to unseen embodied tasks is insufficiently studied, which will help various relevant applications (e.g., home robots). Unlike static perception methods trained on pre-collected images, the embodied agent can move around in the environment and obtain images of objects from any viewpoints. Therefore, efficiently learning the exploration policy and collection method to gather informative training samples is the key to this task. To do this, we first build a 3D semantic distribution map to train the exploration policy self-supervised by introducing the semantic distribution disagreement and the semantic distribution uncertainty rewards. Note that the map is generated from multi-view observations and can weaken the impact of misidentification from an unfamiliar viewpoint. Our agent is then encouraged to explore the objects with different semantic distributions across viewpoints, or uncertain semantic distributions. With the explored informative trajectories, we propose to select hard samples on trajectories based on the semantic distribution uncertainty to reduce unnecessary observations that can be correctly identified. Experiments show that the perception model fine-tuned with our method outperforms the baselines trained with other exploration policies. Further, we demonstrate the robustness of our method in real-robot experiments.
翻译:我们正在见证感知模型方面的重大进展,特别是那些在大规模互联网图像上训练的模型。然而,将这些感知模型有效地推广到未见过的实体任务尚未得到充分研究,这将有助于各种相关应用(例如家庭机器人)。不同于在预先收集的图像上训练的静态感知方法,实体代理可以在环境中移动,并从任何视角获取物体的图像。因此,有效地学习探索策略和收集方法以收集信息丰富的训练样本是这项任务的关键。为此,我们首先建立了一个3D语义分布图,以通过引入语义分布不一致性和语义分布不确定性奖励自监督地训练探索策略。请注意,该地图是从多视角观察生成的,可以削弱来自不熟悉视角的误识别的影响。我们的代理随后被鼓励探索具有不同语义分布或不确定语义分布的对象。通过探索信息丰富的轨迹,我们建议根据语义分布的不确定性从轨迹选择难样本,以减少可以正确识别的不必要观察。实验表明,使用我们的方法微调的感知模型优于使用其他探索策略训练的基线。此外,我们在真实机器人实验中展示了我们方法的鲁棒性。