Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this paper, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.git
翻译:视频压缩缩压成像(SCI)通过使用计算成像的概念进行单一测量,捕捉多个连续连续的视频框。基本原则是通过不同的面罩调节高速框架,这些调制框架被一个低速 2D 传感器(低速光学编码器)所捕捉到的单一测量结果所概括;在此之后,将使用算法来重建所需的高速框架(低频软件解码器)。在本文中,我们考虑视频SCI的重建算法,即从压缩测量中恢复一系列视频框。具体地说,我们提议用空间-时空跨镜(STFormer)来利用空间和时空的相互关系。STFormer网络由一个象征性的生成区块组成,一个视频重建区块,这两个区块由一系列的STFormer区连接。每个STFormer区块由空间自控分支组成,一个时间自控分支和这两个分支的产出由一个聚合网络整合。在模拟模型和真实数据模型上的广泛结果都展示了Star-marmas/Stimmercoal的状态/STRAFormas/STRA/STRA/STGI/SUPARDSDSUDSODSODSUDSUDSOD。