论文标题
三个新的验证器和一个大规模的基准排名无监督的域名适应
Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation
论文作者
论文摘要
对超参数的更改可能会对模型准确性产生巨大影响。因此,超参数的调整在优化机器学习模型中起着重要作用。高参数调整过程不可或缺的一部分是对模型检查点的评估,这是通过使用“验证器”来完成的。在有监督的设置中,这些验证器通过在具有标签的验证集上计算精度来评估检查点。相反,在无监督的设置中,验证集没有此类标签。没有任何标签,就无法计算准确性,因此验证者必须估计准确性。但是,估计准确性的最佳方法是什么?在本文中,我们在无监督的域适应性(UDA)的背景下考虑了这个问题。具体来说,我们提出了三个新的验证器,并在1,000,000个检查站的大数据集中将它们与其他五个现有验证器进行比较。广泛的实验结果表明,我们提出的两个验证者在各种环境中实现了最先进的性能。最后,我们发现在许多情况下,最先进的方法是通过简单的基线方法获得的。据我们所知,这是迄今为止UDA验证者的最大经验研究。代码可从https://www.github.com/kevinmusgrave/poperyful-benchmarker获得。
Changes to hyperparameters can have a dramatic effect on model accuracy. Thus, the tuning of hyperparameters plays an important role in optimizing machine-learning models. An integral part of the hyperparameter-tuning process is the evaluation of model checkpoints, which is done through the use of "validators". In a supervised setting, these validators evaluate checkpoints by computing accuracy on a validation set that has labels. In contrast, in an unsupervised setting, the validation set has no such labels. Without any labels, it is impossible to compute accuracy, so validators must estimate accuracy instead. But what is the best approach to estimating accuracy? In this paper, we consider this question in the context of unsupervised domain adaptation (UDA). Specifically, we propose three new validators, and we compare and rank them against five other existing validators, on a large dataset of 1,000,000 checkpoints. Extensive experimental results show that two of our proposed validators achieve state-of-the-art performance in various settings. Finally, we find that in many cases, the state-of-the-art is obtained by a simple baseline method. To the best of our knowledge, this is the largest empirical study of UDA validators to date. Code is available at https://www.github.com/KevinMusgrave/powerful-benchmarker.