Unsupervised domain adaptation (UDA) adapts a model trained on one domain to a novel domain using only unlabeled data. So many studies have been conducted, especially for semantic segmentation due to its high annotation cost. The existing studies stick to the basic assumption that no labeled sample is available for the new domain. However, this assumption has several issues. First, it is pretty unrealistic, considering the standard practice of ML to confirm the model's performance before its deployment; the confirmation needs labeled data. Second, any UDA method will have a few hyper-parameters, needing a certain amount of labeled data. To rectify this misalignment with reality, we rethink UDA from a data-centric point of view. Specifically, we start with the assumption that we do have access to a minimum level of labeled data. Then, we ask how many labeled samples are necessary for finding satisfactory hyper-parameters of existing UDA methods. How well does it work if we use the same data to train the model, e.g., finetuning? We conduct experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}$\rightarrow$Cityscapes. Our findings are as follows: i) for some UDA methods, good hyper-parameters can be found with only a few labeled samples (i.e., images), e.g., five, but this does not apply to others, and ii) finetuning outperforms most existing UDA methods with only ten labeled images.
翻译:未经监督的域适应 (UDA) 将一个在某一域训练的模型改造成一个新域, 仅使用未贴标签的数据。 因此, 已经进行了许多研究, 特别是由于注释成本高, 特别是语义分离。 现有的研究坚持了新域没有标签样本的基本假设。 但是, 这一假设有几个问题。 首先, 考虑到 ML 的标准做法, 以确认模型在部署前的性能; 确认需要标签数据。 其次, 任何 UDA 方法都将有一些超参数, 需要一定数量的标签数据。 为了纠正这种与现实的不匹配, 我们从数据中心的角度重新思考 UDA 。 具体地说, 我们首先假设我们确实可以获取最起码的标签数据。 然后, 我们问, 需要多少标签样本来找到现有UDA 方法的令人满意的超参数。 如果我们使用相同的数据来训练模型, 譬如, 微调? 我们用精确的图像来解问题, 以最受欢迎的假设 {A\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\