Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. Dynamic backgrounds poses a significant challenge here. Supervised deep learning-based techniques are currently considered state-of-the-art for this task. However, these methods require pixel-wise ground-truth labels, which can be time-consuming and expensive. In this work, we propose a weakly supervised framework that can perform background subtraction without requiring per-pixel ground-truth labels. Our framework is trained on a moving object-free sequence of images and comprises two networks. The first network is an autoencoder that generates background images and prepares dynamic background images for training the second network. The dynamic background images are obtained by thresholding the background-subtracted images. The second network is a U-Net that uses the same object-free video for training and the dynamic background images as pixel-wise ground-truth labels. During the test phase, the input images are processed by the autoencoder and U-Net, which generate background and dynamic background images, respectively. The dynamic background image helps remove dynamic motion from the background-subtracted image, enabling us to obtain a foreground image that is free of dynamic artifacts. To demonstrate the effectiveness of our method, we conducted experiments on selected categories of the CDnet 2014 dataset and the I2R dataset. Our method outperformed all top-ranked unsupervised methods. We also achieved better results than one of the two existing weakly supervised methods, and our performance was similar to the other. Our proposed method is online, real-time, efficient, and requires minimal frame-level annotation, making it suitable for a wide range of real-world applications.
翻译:从对象跟踪到视频监控,动态背景在这里构成一个重大挑战。 受监督的深学习技术目前被视为此任务的最新技术。 然而, 这些方法需要像素智能的地面真相标签, 这可能耗时且昂贵。 在这项工作中, 我们提出一个监督不力的框架, 可以进行背景减缩, 而不需要 Perixel 地面真相标签。 我们的框架在移动的无对象图像序列上培训, 由两个网络组成。 第一个网络是一个自动编码, 生成背景图像, 为培训第二个网络提供动态背景图像。 这些方法需要像像像像像像素一样的无目标背景标签和动态背景图像。 在测试阶段, 我们的输入图像由现有的自动编码和U-Net处理, 产生背景图像的背景和动态背景图像, 分别生成背景图像和动态背景图像。 动态背景图像使用U- 网络使用相同的无目标视频进行培训, 并使用像像像像像素一样的地面标签。 在测试阶段, 输入图像由现有的自动编码和U- 网络处理, 并生成背景和动态背景图像, 也通过动态背景图像系统演示我们实现动态的动态背景图像的系统, 。</s>