We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the model prediction,~\eg foreground regions, or it can be safely skipped, e.g. background regions. These gates can either be implemented as an efficient network trained jointly with convolution kernels, or can simply skip the residuals based on their magnitude. Gating functions can also incorporate block-wise sparsity structures, as required for efficient implementation on hardware platforms. By replacing all convolutions with Skip-Convolutions in two state-of-the-art architectures, namely EfficientDet and HRNet, we reduce their computational cost consistently by a factor of 3~4x for two different tasks, without any accuracy drop. Extensive comparisons with existing model compression, as well as image and video efficiency methods demonstrate that Skip-Convolutions set a new state-of-the-art by effectively exploiting the temporal redundancies in videos.
翻译:我们建议跳过演进, 以利用视频流中的大量冗余, 并节省计算。 每段视频都作为一系列跨框架和网络激活的改变, 以剩余部分表示。 我们重新配置标准演进, 以便在剩余框架上高效计算: 每层配上一个二进制门, 决定剩余部分是否对模型预测很重要, ⁇ eg前景区域, 或者可以安全地跳过, 例如背景区域。 这些门可以作为高效网络实施, 与组合内核联合培训, 或者简单地跳过根据其规模的剩余部分。 配置功能还可以根据硬件平台的高效实施, 包含成块的缓冲结构 。 通过在两种最先进的结构, 即高效的 Det 和 HRNet 中, 将所有演动替换为跳动, 我们以3~ 4x 系数 来持续降低其计算成本, 用于两种不同的任务, 而不精确下降 。 与现有的模型压缩进行广泛比较, 以及图像和视频效率方法显示跳动将设置新的时局。