Feature extraction has always been a critical component of the computer vision field. More recently, state-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF). The transferability of DNN knowledge domains has enabled the wide use of pretrained DNN feature extraction for applications with novel object classes, especially those with limited training data. This study analyzes the general discriminability of novel object visual appearances encoded into the DeCAF space of six of the leading visual recognition DNN architectures. The results of this study characterize the Mahalanobis distances and cosine similarities between DeCAF object manifolds across two visual object tracking benchmark data sets. The backgrounds surrounding each object are also included as an object classes in the manifold analysis, providing a wider range of novel classes. This study found that different network architectures led to different network feature focuses that must to be considered in the network selection process. These results are generated from the VOT2015 and UAV123 benchmark data sets; however, the proposed methods can be applied to efficiently compare estimated network performance characteristics for any labeled visual data set.
翻译:DNN知识域的可转让性使得能够广泛使用经过预先训练的 DNN 特征提取来应用新型对象类,特别是培训数据有限的对象类。这项研究分析了在DACAF 空间编码的六种主要视觉识别 DNN 结构的DACAF 空间中的新物体视觉外观的一般差异性。这项研究的结果将Mahalanobis 距离和相近性等离谱特性定性为DCAF在两个视觉物体跟踪基准数据集上的方位。每个对象的背景也作为对象类别列入多重分析,提供了更广泛的新类。这项研究发现,不同的网络结构导致在网络选择过程中必须考虑的不同网络特征。这些结果来自VOT2015和UAV123基准数据集;然而,拟议的方法可以有效地比较任何视觉网络估计性能特征。