Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited.
翻译:传统CNN模型通过相对较低的分辨率图像(<300 px)进行培训和测试,由于计算和记忆限制,无法直接在大型图像上运行。我们提议Patch Gradient Emprole (PatchGD),这是一个有效的学习战略,可以以端到端的方式对现有的CNN大型图像结构进行培训。PatchGD所依据的假设是,与其同时对整张图像进行基于梯度的更新,不如同时对图像的一小部分进行模型更新,确保大多数图像在迭代过程中被覆盖,从而实现一个良好的解决方案。因此,PatchGD在大规模图像培训模型时,广泛享有更好的记忆和计算效率。PANDA和UltraMNIST在不同的记忆限制下,对两个数据集进行了彻底评价,两个数据集是ResNet50和MovedNetV2模型。我们的评估清楚地表明,在处理大型图像时,特别是在压缩记忆时,PatchGD比标准的梯度-白度方法更加稳定和高效。