Semantic segmentation networks are usually pre-trained once and not updated during deployment. As a consequence, misclassifications commonly occur if the distribution of the training data deviates from the one encountered during the robot's operation. We propose to mitigate this problem by adapting the neural network to the robot's environment during deployment, without any need for external supervision. Leveraging complementary data representations, we generate a supervision signal, by probabilistically accumulating consecutive 2D semantic predictions in a volumetric 3D map. We then train the network on renderings of the accumulated semantic map, effectively resolving ambiguities and enforcing multi-view consistency through the 3D representation. In contrast to scene adaptation methods, we aim to retain the previously-learned knowledge, and therefore employ a continual learning experience replay strategy to adapt the network. Through extensive experimental evaluation, we show successful adaptation to real-world indoor scenes both on the ScanNet dataset and on in-house data recorded with an RGB-D sensor. Our method increases the segmentation accuracy on average by 9.9% compared to the fixed pre-trained neural network, while retaining knowledge from the pre-training dataset.
翻译:因此,如果培训数据的分布与机器人运行期间遇到的数据不一致,通常会发生错误分类。我们提议通过在部署期间使神经网络适应机器人的环境来缓解这一问题,而无需外部监督。我们利用补充数据表征,通过在体积3D地图中概率性地积累连续的2D语义预测,产生一个监督信号。然后,我们训练网络如何绘制累积的语义图,有效地解决模糊之处,并通过3D代表方式加强多视图一致性。与现场适应方法不同,我们的目标是保留先前所学的知识,因此采用不断学习的经验再玩战略来调整网络。通过广泛的实验评估,我们展示了成功适应扫描网数据集和RGB-D传感器所记录的内部数据真实世界室内场。我们的方法使平均分解精确度比固定的受训练前神经网络增加9.9%,同时保留培训前数据集的知识。