Representation learning in recent years has been addressed with self-supervised learning methods. The input data is augmented into two distorted views and an encoder learns the representations that are invariant to distortions -- cross-view prediction. Augmentation is one of the key components in cross-view self-supervised learning frameworks to learn visual representations. This paper presents ExAgt, a novel method to include expert knowledge for augmenting traffic scenarios, to improve the learnt representations without any human annotation. The expert-guided augmentations are generated in an automated fashion based on the infrastructure, the interactions between the EGO and the traffic participants and an ideal sensor model. The ExAgt method is applied in two state-of-the-art cross-view prediction methods and the representations learnt are tested in downstream tasks like classification and clustering. Results show that the ExAgt method improves representation learning compared to using only standard augmentations and it provides a better representation space stability. The code is available at \url{https://github.com/lab176344/ExAgt}.
翻译:近年来,通过自我监督的学习方法解决了代表性学习问题。输入数据被扩充为两种扭曲观点,编码员则学习了不同变异的表达方式 -- -- 交叉视图预测。强化是交叉视图自我监督学习框架的关键组成部分之一,以学习视觉表达方式。本文介绍了ExAgt,这是一种包括专家知识的新方法,用于增加交通情况,在没有任何人类批注的情况下改进所学到的表述方式。专家指导增强是以自动化方式生成的,其基础是基础设施、EGO与交通参与者之间的互动和理想的传感器模型。ExAgt方法用于两种最先进的交叉视图预测方法,所学到的表述方式在分类和组合等下游任务中测试。结果显示ExAgt方法改进了代表性学习,而仅使用标准增强手段,提供了更好的空间稳定性。代码可在以下网站查阅:<url{https://github.com/lab17644/Exgt}。