Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D images, these networks are used merely to learn and generate data sequentially rather than for image recognition tasks. In this study, we propose integrating an RNN as an additional layer when designing image recognition models. We also develop end-to-end multimodel ensembles that produce expert predictions using several models. In addition, we extend the training strategy so that our model performs comparably to leading models and can even match the state-of-the-art models on several challenging datasets (e.g., SVHN (0.99), Cifar-100 (0.9027) and Cifar-10 (0.9852)). Moreover, our model sets a new record on the Surrey dataset (0.949). The source code of the methods provided in this article is available at https://github.com/leonlha/e2e-3m and http://nguyenhuuphong.me.
翻译:在长达几十年的机器学习的漫长历史中,经常神经网络(RNN)主要用于连续数据和时间序列以及一般的1D信息。即使在对2D图像的一些罕见的研究中,这些网络也仅仅用于按顺序学习和生成数据,而不是图像识别任务。在这项研究中,我们提议在设计图像识别模型时将RN作为额外的一层。我们还开发了端对端多模型组合,利用几种模型作出专家预测。此外,我们扩展了培训战略,以便我们的模型能够与领先模型进行比较,甚至能够与几个具有挑战性的数据集(例如,SVHN(0.99)、Cifar-100(0.9027)和Cifar-10(0.9852))的先进模型相匹配。此外,我们的模型在Surrey数据集(0.94949)上建立了新记录。该文章提供的方法的源代码见https://github.com/leonlha/e2e-3m和http://guyenhuuphum。