We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices.
翻译:我们考虑的是使用关键点估计来改进特定测试图像的人类实例分割面罩质量的问题。 我们比较了两种替代方法。 第一种方法是测试-时间适应(TTA)方法, 我们允许使用单一的无标签测试图像来测试- 时间修改分割网的重量。 在这个方法中, 我们不假定测试- 时间访问标签源数据集。 更具体地说, 我们的 TTA 方法包括使用关键点估计值作为假标签, 并反演以调整主干量选择。 第二种方法是培训- 时间一般化(TTG)方法, 我们允许脱机访问标签源数据集, 而不是测试- 调整重量。 此外, 我们不假定目标领域的任何图像或知识的可用性修改 。 我们的TTG 方法包括用关键点头生成的图像来增强主干线特征, 并将总体矢量喂给遮罩头。 我们评估了两种方法, 并确定了限制TTTA收益的若干因素。 特别是, 我们允许脱机访问标记源源数据集的大幅变化, 而TTTTA的地域收益可能带来很大的改变。