In future 6G cellular networks, a joint communication and sensing protocol will allow the network to perceive the environment, opening the door for many new applications atop a unified communication-perception infrastructure. However, interpreting the sparse radio representation of sensing scenes is challenging, which hinders the potential of these emergent systems. We propose to combine radio and vision to automatically learn a radio-only sensing model with minimal human intervention. We want to build a radio sensing model that can feed on millions of uncurated data points. To this end, we leverage recent advances in self-supervised learning and formulate a new label-free radio-visual co-learning scheme, whereby vision trains radio via cross-modal mutual information. We implement and evaluate our scheme according to the common linear classification benchmark, and report qualitative and quantitative performance metrics. In our evaluation, the representation learnt by radio-visual self-supervision works well for a downstream sensing demonstrator, and outperforms its fully-supervised counterpart when less labelled data is used. This indicates that self-supervised learning could be an important enabler for future scalable radio sensing systems.
翻译:在未来的6G蜂窝网络中,一个联合通信和遥感协议将使网络能够感知环境,在统一的通信感知基础设施上为许多新的应用打开大门;然而,对遥感场景的稀少的无线电代表方式进行解释具有挑战性,这妨碍了这些新兴系统的潜力。我们提议将无线电和视觉结合起来,自动学习一个只用无线电的遥感模型,同时尽量减少人类的干预。我们想建立一个能够以数百万未精确的数据点为食的无线电遥感模型。为此,我们利用了在自我监督学习方面的最新进展,并制定了一个新的无标签的无线电-视听共同学习计划,通过跨模式的相互信息对无线电进行视觉培训。我们根据共同线性分类基准执行和评估我们的计划,并报告质量和数量性业绩指标。在我们的评价中,通过无线电-视觉自我监督所学的代表性对下游遥感示范器非常有效,在使用贴标签较少的数据时超越了完全受监督的对应器。这表明,自我监督的学习可以成为未来可测量的无线电遥感系统的重要促进器。