Pre-training has marked numerous state of the arts in high-level computer vision, but few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we present an in-depth study of image pre-training. To conduct this study on solid ground with practical value in mind, we first propose a generic, cost-effective Transformer-based framework for image processing. It yields highly competitive performance across a range of low-level tasks, though under constrained parameters and computational complexity. Then, based on this framework, we design a whole set of principled evaluation tools to seriously and comprehensively diagnose image pre-training in different tasks, and uncover its effects on internal network representations. We find pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to higher layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in a little gain. Further, we explore different methods of pre-training, revealing that multi-task pre-training is more effective and data-efficient. All codes and models will be released at https://github.com/fenglinglwb/EDT.
翻译:培训前在高层次计算机视野方面有着众多的艺术特征,但很少尝试调查图像处理系统中的培训前行为。我们在本文件中对图像处理系统的培训前行为进行了深入研究。为了在扎实的地面上进行这项研究,我们首先提出一个通用的、成本效益高的、基于图像处理的变异器框架;尽管在有限的参数和计算复杂性下,它产生一系列低层次任务的高度竞争性业绩。然后,我们根据这个框架设计了一整套原则性评价工具,认真和全面地诊断不同任务中的形象培训前行为,并揭示其对内部网络代表的影响。我们发现,培训前在低层次的任务中扮演着截然不同的角色。例如,培训前将更多的当地信息引入超分辨率的更高层次,从而取得显著的绩效收益,而培训前几乎没有影响内部特征表现,导致微增益。我们探索了不同的培训前方法,揭示了多重任务前培训是更为有效和数据效率更高的。所有代码和模型将在 https://gibus/commbTEDDD.