Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the evaluation of model checkpoints, which is done through the use of "validators". In a supervised setting, these validators evaluate checkpoints by computing accuracy on a validation set that has labels. In contrast, in an unsupervised setting, the validation set has no such labels. Without any labels, it is impossible to compute accuracy, so validators must estimate accuracy instead. But what is the best approach to estimating accuracy? In this paper, we consider this question in the context of unsupervised domain adaptation (UDA). Specifically, we propose three new validators, and we compare and rank them against five other existing validators, on a large dataset of 1,000,000 checkpoints. Extensive experimental results show that two of our proposed validators achieve state-of-the-art performance in various settings. Finally, we find that in many cases, the state-of-the-art is obtained by a simple baseline method. To the best of our knowledge, this is the largest empirical study of UDA validators to date. Code is available at https://www.github.com/KevinMusgrave/powerful-benchmarker.
翻译:超光度计的变化可能对模型精度产生巨大影响。 因此, 超光度计的调试在优化机器学习模型方面起着重要作用。 超光度调试进程的一个组成部分是评估示范检查站, 这项工作是通过使用“ 验证器” 完成的。 在受监督的环境中, 这些验证器通过在标签标签的验证数据集上计算准确度来评估检查站。 相比之下, 在未经监督的环境下, 验证器没有这样的标签。 没有标签, 无法计算准确性, 因此验证器必须评估准确性。 但是, 在本文中,什么是估算准确性的最佳方法? 但是, 我们从不受监督的域适应的角度( UDA) 来考虑这一问题。 具体而言, 我们提议了三个新的验证器, 并在一个有1 000 000个标签的大型数据集上, 对照另外五个现有的验证器进行对比和排序。 广泛的实验结果显示, 我们的两个拟议的验证器在各种环境中都达到最先进的性能。 最后, 我们发现, 在很多情况下, 状态- 艺术状态是用一个简单的基线方法获得的。 数据库/ 。</s>