Extracting class activation maps (CAM) is arguably the most standard step of generating pseudo masks for weakly-supervised semantic segmentation (WSSS). Yet, we find that the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM. Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field. As a result, given a class, its hot CAM pixels may wrongly invade the area belonging to other classes, or the non-hot ones may be actually a part of the class. To this end, we introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE), dubbed \textbf{ReCAM}. Given an image, we use CAM to extract the feature pixels of each single class, and use them with the class label to learn another fully-connected layer (after the backbone) with SCE. Once converged, we extract ReCAM in the same way as in CAM. Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes and hence less mask ambiguity is expected. The evaluation on both PASCAL VOC and MS~COCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.
翻译:解压缩类中激活地图( CAM) 可以说是生成伪面具的最标准步骤, 用于低监管语义分解( WSSSS) 。 然而, 我们发现不满意假面具的柱石是 CAM 中广泛使用的二进制跨植物损失( BCE ) 。 具体地说, 由于 BCE 的超类集合性质, CAM 中每个像素可能会对同一接收字段中同时出现的多个类做出响应。 因此, 在一个类中, 热的 CAM 等量可能会错误地侵入属于其他类的区域, 或非热质面面面面罩实际上可能是该类的一部分。 为此, 我们引入了一个令人尴尬而令人惊讶的简单有效的方法: 重新激活与 BCE 相交集的 CAM, 使用软式跨类中积分的 \ textb{ReCAM, 我们使用 CAM 的特性支持每个单级的特性, 并且用类标签来学习另一个完全连结层( 之后的 CAM ), 和 SCE 平流中的任何渐渐渐渐渐渐变的变的 。