Video capture is the most extensively utilized human perception source due to its intuitively understandable nature. A desired video capture often requires multiple environmental conditions such as ample ambient-light, unobstructed space, and proper camera angle. In contrast, wireless measurements are more ubiquitous and have fewer environmental constraints. In this paper, we propose CSI2Video, a novel cross-modal method that leverages only WiFi signals from commercial devices and a source of human identity information to recover fine-grained surveillance video in a real-time manner. Specifically, two tailored deep neural networks are designed to conduct cross-modal mapping and video generation tasks respectively. We make use of an auto-encoder-based structure to extract pose features from WiFi frames. Afterward, both extracted pose features and identity information are merged to generate synthetic surveillance video. Our solution generates realistic surveillance videos without any expensive wireless equipment and has ubiquitous, cheap, and real-time characteristics.
翻译:视频捕捉是人类感知最广泛使用的来源,因为它具有直觉的可理解性。想要的视频捕捉往往需要多种环境条件,如环境光、不受阻碍的空间和适当的摄像角度。相比之下,无线测量更普遍,环境限制较少。在本论文中,我们提议CSI2Video是一种新型的跨模式方法,仅利用商业装置的WiFi信号和人类身份信息源,实时恢复精细的监视视频。具体地说,设计了两个定制的深神经网络,分别进行跨式绘图和视频生成任务。我们利用基于自动编码的结构从WiFi框架中提取特征。随后,将提取的外形特征和身份信息合并以生成合成监视视频。我们的解决方案产生现实的监视视频,而没有昂贵的无线设备,并且具有无处不在、廉价和实时的特性。