The research of visual signal compression has a long history. Fueled by deep learning, exciting progress has been made recently. Despite achieving better compression performance, existing end-to-end compression algorithms are still designed towards better signal quality in terms of rate-distortion optimization. In this paper, we show that the design and optimization of network architecture could be further improved for compression towards machine vision. We propose an inverted bottleneck structure for end-to-end compression towards machine vision, which specifically accounts for efficient representation of the semantic information. Moreover, we quest the capability of optimization by incorporating the analytics accuracy into the optimization process, and the optimality is further explored with generalized rate-accuracy optimization in an iterative manner. We use object detection as a showcase for end-to-end compression towards machine vision, and extensive experiments show that the proposed scheme achieves significant BD-rate savings in terms of analysis performance. Moreover, the promise of the scheme is also demonstrated with strong generalization capability towards other machine vision tasks, due to the enabling of signal-level reconstruction.
翻译:视觉信号压缩研究历史悠久。 深层学习推动,最近取得了令人振奋的进展。 尽管实现了更好的压缩性能,但现有的端到端压缩算法仍然在设计上,在比例扭曲优化方面达到更好的信号质量。 在本文中,我们表明网络结构的设计与优化可以进一步改进,以便向机器视觉压缩;我们建议为机器视觉端到端压缩设置一个倒置的瓶颈结构,这具体体现了语义信息的有效表达。此外,我们还通过将分析性准确性纳入优化进程来追求优化能力,并且进一步探索优化性,同时以迭接方式普遍使用成本-准确性优化。我们利用天体探测作为机器视觉端到端压缩的示范,并进行广泛的实验,表明拟议方案在分析性能方面实现了巨大的BD节率节约。 此外,由于能够进行信号级重建,还展示了对其它机器视觉任务具有很强的普及能力。