Spatial pooling is an important step in computer vision systems like Convolutional Neural Networks or the Bag-of-Words method. The spatial pooling purpose is to combine neighbouring descriptors to obtain a single descriptor for a given region (local or global). The resultant combined vector must be as discriminant as possible, in other words, must contain relevant information, while removing irrelevant and confusing details. Maximum and average are the most common aggregation functions used in the pooling step. To improve the aggregation of relevant information without degrading their discriminative power for image classification, we introduce a simple but effective scheme based on Ordered Weighted Average (OWA) aggregation operators. We present a method to learn the weights of the OWA aggregation operator in a Bag-of-Words framework and in Convolutional Neural Networks, and provide an extensive evaluation showing that OWA based pooling outperforms classical aggregation operators.
翻译:空间集合是计算机视觉系统中的一个重要步骤,如进化神经网络或一袋文字方法。空间集合的目的是将邻近的描述词结合起来,以获得一个特定区域(地方或全球)的单一描述词。由此产生的混合矢量必须尽可能具有争议性,换句话说,必须包含相关信息,同时消除不相干和混乱的细节。最大和平均是集合步骤中使用的最常见集合功能。为了改进相关信息的汇总,同时不降低其图像分类的歧视性力量,我们采用了基于有秩序加权平均(OWA)汇总操作器的简单而有效的计划。我们提出了一个方法,在瓦兹包框架和进化神经网络中学习OWA汇总操作器的重量,并提供广泛的评价,表明OWA将集合起来超越古典集合操作器。