We present an end-to-end CNN architecture for fine-grained visual recognition called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative filter after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nature of the data with limited samples in an end-to-end fashion. We perform a detailed study of the performance with 1-stage and 2-stage transfer learning and different configurations with benchmark architectures like AlexNet and VggNet. The ablation study shows that the proposed method outperforms its constituent parts considerably and consistently. CoCoNet also outperforms the baseline popular deep learning based fine-grained recognition method, namely Bilinear-CNN (BCNN) with statistical significance. Experiments have been performed on the fine-grained species recognition problem, but the method is general enough to be applied to other similar tasks. Lastly, we also introduce a new public dataset for fine-grained species recognition, that of Indian endemic birds and have reported initial results on it. The training metadata and new dataset are available through the corresponding author.
翻译:我们提出了一个名为“协作进化网络(CoCONNet)”的端对端CNN图像识别结构。网络在卷轴层之后使用一个协作过滤器,以显示从整个培训样本中学习的特征的最佳加权合作,而不是一次的。这给了COCONNet以端对端方式将数据与有限样本的细细采集性质编码的更大权力。我们详细研究了1阶段和2阶段转移学习的性能,以及具有AlexNet和VggNet等基准结构的不同配置。 动画研究表明,拟议的方法大大和一贯地超越了其组成部分。 CoCoCoNet还超越了基于精细化识别方法(即具有统计意义的Bilinear-CNN(BCNN))的基线深层学习方法。对精细细的物种识别问题进行了实验,但该方法非常笼统,足以适用于其他类似任务。最后,我们还引入了一个新的公共数据集,用于精细的物种识别。ConNet还持续地展示了它的构成部分。CocoNet还完成了以印度本地和最新数据形式提供的数据。