Most image data available are often stored in a compressed format, from which JPEG is the most widespread. To feed this data on a convolutional neural network (CNN), a preliminary decoding process is required to obtain RGB pixels, demanding a high computational load and memory usage. For this reason, the design of CNNs for processing JPEG compressed data has gained attention in recent years. In most existing works, typical CNN architectures are adapted to facilitate the learning with the DCT coefficients rather than RGB pixels. Although they are effective, their architectural changes either raise the computational costs or neglect relevant information from DCT inputs. In this paper, we examine different ways of speeding up CNNs designed for DCT inputs, exploiting learning strategies to reduce the computational complexity by taking full advantage of DCT inputs. Our experiments were conducted on the ImageNet dataset. Results show that learning how to combine all DCT inputs in a data-driven fashion is better than discarding them by hand, and its combination with a reduction of layers has proven to be effective for reducing the computational costs while retaining accuracy.
翻译:现有大多数图像数据通常以压缩格式存储,JPEG最普遍。为了将这些数据输入进化神经网络(CNN),需要一个初步解码程序来获取 RGB 像素,要求大量计算负荷和记忆使用。因此,近年来,为处理JPEG压缩数据而设计的CNN 已经引起注意。在大多数现有工作中,典型的CNN结构经过调整,以便利以DCT系数而不是RGB像素进行学习。尽管这些结构是有效的,但它们的建筑变化要么提高了计算成本,要么忽略了DCT输入的相关信息。在本文中,我们研究了加速为DCT输入设计的CNN的不同方法,利用学习战略,充分利用DCT输入来减少计算复杂性。我们在图像网络数据集上进行了实验。结果显示,学习如何将所有DCT输入的数据以数据驱动的方式合并起来比用手丢弃它们要好,而其与减少的层次相结合已证明在保持准确性的同时可以有效降低计算成本。