High image resolution is critical to obtain a good performance in many computer vision applications. Computational complexity of CNNs, however, grows significantly with the increase in input image size. Here, we show that it is almost always possible to modify a network such that it achieves higher accuracy at a higher input resolution while having the same number of parameters or/and FLOPS. The idea is similar to the EfficientNet paper but instead of optimizing network width, depth and resolution simultaneously, here we focus only on input resolution. This makes the search space much smaller which is more suitable for low computational budget regimes. More importantly, by controlling for the number of model parameters (and hence model capacity), we show that the additional benefit in accuracy is indeed due to the higher input resolution. Preliminary empirical investigation over MNIST, Fashion MNIST, and CIFAR10 datasets demonstrates the efficiency of the proposed approach.
翻译:高图像分辨率对于在许多计算机视觉应用软件中取得良好表现至关重要。 但是,随着输入图像规模的增加,CNN的计算复杂性会大幅增长。在这里,我们表明几乎总是有可能修改网络,使其在较高的输入分辨率上达到更高的准确度,同时具有相同数量的参数或/和FLOPS。这个想法与高效网络纸类似,而不是同时优化网络宽度、深度和分辨率,我们在这里只关注输入分辨率。这使得搜索空间小得多,更适合低计算预算制度。更重要的是,通过控制模型参数的数量(以及模型能力),我们表明准确性的额外好处确实在于投入分辨率更高。对MNIST、Fashon MNIST和CIFAR10数据集的初步经验调查显示了拟议方法的效率。