In this paper, we introduce source domain subset sampling (SDSS) as a new perspective of semi-supervised domain adaptation. We propose domain adaptation by sampling and exploiting only a meaningful subset from source data for training. Our key assumption is that the entire source domain data may contain samples that are unhelpful for the adaptation. Therefore, the domain adaptation can benefit from a subset of source data composed solely of helpful and relevant samples. The proposed method effectively subsamples full source data to generate a small-scale meaningful subset. Therefore, training time is reduced, and performance is improved with our subsampled source data. To further verify the scalability of our method, we construct a new dataset called Ocean Ship, which comprises 500 real and 200K synthetic sample images with ground-truth labels. The SDSS achieved a state-of-the-art performance when applied on GTA5 to Cityscapes and SYNTHIA to Cityscapes public benchmark datasets and a 9.13 mIoU improvement on our Ocean Ship dataset over a baseline model.
翻译:在本文中,我们引入了源域子取样(SDSS),作为半监督域适应的新视角。我们建议通过取样和从培训的源数据中只利用一个有意义的子集来进行域适应。我们的关键假设是,整个源域数据可能包含对适应无益的样本。因此,对域的适应可受益于仅由有用和相关样本组成的源数据子集。拟议方法有效地将全部源数据作为生成一个小规模有意义的子集的子集。因此,培训时间缩短了,利用我们子抽样源数据提高了性能。为了进一步核实我们的方法的可缩放性,我们建造了一个称为海洋船的新数据集,由500个真实的和200K的合成样本图像组成,并贴有地面真相标签。当应用GTA5到城市景区和SYNTHHIA到城市景区公共基准数据集时,SYNTHIA实现了一种最先进的性能。以及我们海洋船数据集在基线模型上改进了9.13 mIOU。