We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks using a convolutional network with context-aware skip connections, and compressed, hypercolumn image features combined with a convolutional tessellation procedure. In order to maintain high output fidelity, our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures. We maintain this consistent, high grade fidelity efficiently in our model chiefly through two means: (1) We use a statistically-principled tensor decomposition procedure to modulate the number of hypercolumn features and (2) We render these features in their native resolution using a convolutional tessellation technique. For improved pixel level segmentation results, we introduce a boundary loss function; for improved temporal coherence in video data, we include temporal image information in our model. Through experiments, we demonstrate the improved accuracy of our model against baseline models for interactive segmentation tasks using high resolution video data. We also introduce a benchmark video segmentation dataset, the VFX Segmentation Dataset, which contains over 27,046 high resolution video frames, including greenscreen and various composited scenes with corresponding, hand crafted, pixel level segmentations. Our work presents an extension to improvement to state of the art segmentation fidelity with high resolution data and can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.
翻译:我们为交互式视频分解任务提供了一种高度忠诚的深层学习算法(HyperSeg),用于使用具有上下文认知连接的进化网络进行互动视频分解任务,并使用压缩的超柱图像特征与进化导相联程序进行压缩。为了保持高输出忠诚,我们的模型至关重要,并使所有图像特征具有高分辨率,而不使用下取样或集合程序。我们主要通过两种方式在我们模型中保持这种一致的、高等级忠诚性:(1) 我们使用一种统计式的高分辨率高分解程序,以调调超结结层特征的数量;(2) 我们使用进化导相联线连接技术,在本地解析中设置这些特征。为了改进像素水平的分解结果,我们引入了边界损失功能;为了提高视频数据的时间一致性,我们在模型中加入了时间分解规则。我们通过实验,能够用高分辨率视频数据比基线模型更精确地显示互动分解任务使用高分辨率视频数据集、VFX分解数据元化数据集,其中含有27以上、0级高分辨率分解、高分辨率分解到高分辨率段的图像。