Texture analysis is a classical yet challenging task in computer vision for which deep neural networks are actively being applied. Most approaches are based on building feature aggregation modules around a pre-trained backbone and then fine-tuning the new architecture on specific texture recognition tasks. Here we propose a new method named \textbf{R}andom encoding of \textbf{A}ggregated \textbf{D}eep \textbf{A}ctivation \textbf{M}aps (RADAM) which extracts rich texture representations without ever changing the backbone. The technique consists of encoding the output at different depths of a pre-trained deep convolutional network using a Randomized Autoencoder (RAE). The RAE is trained locally to each image using a closed-form solution, and its decoder weights are used to compose a 1-dimensional texture representation that is fed into a linear SVM. This means that no fine-tuning or backpropagation is needed. We explore RADAM on several texture benchmarks and achieve state-of-the-art results with different computational budgets. Our results suggest that pre-trained backbones may not require additional fine-tuning for texture recognition if their learned representations are better encoded.
翻译:纹理分析是计算机愿景中一个典型但具有挑战性的任务,目前正在积极应用深神经网络。 多数方法都基于在事先训练的骨干周围建立特征聚合模块, 然后在特定的质地识别任务上微调新架构。 我们在这里建议了一种名为\ textbf{R} 兰多编码的新方法, 名为\ textbf{A} 聚合的\ textbf{D}epte \ textbff{M}pats (RADAM), 它可以在不改变骨干的情况下提取丰富的纹理表达。 技术包括使用随机化自动编码(RAE) 将预先训练的深度深电动网络的输出编码。 RADAM 在当地对每种图像都进行了培训, 使用封闭式的解决方案, 其解码权重被用于构建一个一维纹理的表达方式, 并纳入线性 SVM 。 这意味着不需要微调整或反向分析。 我们探索几个的文本基准, 并且实现在经过精细化的骨质图解前的显示我们可能要求改进的校正的校正的校正的校正的校正预算。</s>