Current supervised visual detectors, though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric inductive bias, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation in Slot-TTA greatly improves instance segmentation in out-of-distribution scenes. We evaluate Slot-TTA in several 3D and 2D scene instance segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised test-time adaptation methods.
翻译:最近的测试时间适应方法使用辅助自我监督损失来使网络参数独立地适应每个测试实例,并展示出在图像分类任务培训分配范围以外实现一般化的有希望的结果。我们在工作中发现,这些损失可能不够充分,例如分类任务,而不考虑建筑诱导偏差。对于图像分割而言,最近以槽为中心的基因变异模型通过重建像素,试图以自我监督的方式将场景分成实体,从而打破对监督的这种依赖。我们利用这两行工作,提出了Slot-TTA(一个半监督试样分解模型,配有以槽为中心的感知偏差偏差模型),该模型在试验时间通过重建的梯度下降或新颖的视觉合成目标在每一个场上加以调整。我们显示,Slot-TA的测试时间适应大大改进了分配外场面的区块分化。我们评估了Slot-TA在几个3D和2D图像分解剖基准,并展示了大量以外向外测试的方式改进了自我升级的测试方法。