There is a growing literature demonstrating the feasibility of using Radio Frequency (RF) signals to enable key computer vision tasks in the presence of occlusions and poor lighting. It leverages that RF signals traverse walls and occlusions to deliver through-wall pose estimation, action recognition, scene captioning, and human re-identification. However, unlike RGB datasets which can be labeled by human workers, labeling RF signals is a daunting task because such signals are not human interpretable. Yet, it is fairly easy to collect unlabelled RF signals. It would be highly beneficial to use such unlabeled RF data to learn useful representations in an unsupervised manner. Thus, in this paper, we explore the feasibility of adapting RGB-based unsupervised representation learning to RF signals. We show that while contrastive learning has emerged as the main technique for unsupervised representation learning from images and videos, such methods produce poor performance when applied to sensing humans using RF signals. In contrast, predictive unsupervised learning methods learn high-quality representations that can be used for multiple downstream RF-based sensing tasks. Our empirical results show that this approach outperforms state-of-the-art RF-based human sensing on various tasks, opening the possibility of unsupervised representation learning from this novel modality.
翻译:越来越多的文献表明,使用无线电频率(RW)信号是可行的,在封闭性和光度差的情况下,使用无线电频率(RF)信号使关键的计算机愿景任务成为可行的。它利用RF信号穿墙和隔墙提供通过墙的信号,构成了估计、行动识别、场景说明和人类再识别。然而,与工人可以贴上标签的RGB数据集不同,贴上RF信号标签是一项艰巨的任务,因为这种信号不是人类可解释的。然而,收集无标签的RF信号相当容易。使用这种未贴标签的RF数据来以不受监督的方式学习有用的演示将非常有益。因此,在本文件中,我们探讨了将基于RGB的不受监督的演示学习与RF信号相适应的可行性。我们表明,虽然对比性学习已成为从图像和视频中学习不受监督的演示的主要技术,但在应用新的RF信号对人进行感测时,这种方法的性能差。相比之下,预测性不超超的学习方法会学习高质量的演示,可用于多次下游RF的演示方法。