While hyperparameter optimization (HPO) is known to greatly impact learning algorithm performance, it is often treated as an empirical afterthought. Recent empirical works have highlighted the risk of this second-rate treatment of HPO. They show that inconsistent performance results, based on choice of hyperparameter subspace to search, are a widespread problem in ML research. When comparing two algorithms, J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite result. In short, your choice of hyperparameters can deceive you. We provide a theoretical complement to this prior work: We analytically characterize this problem, which we term hyperparameter deception, and show that grid search is inherently deceptive. We prove a defense with guarantees against deception, and demonstrate a defense in practice.
翻译:虽然已知超参数优化(HPO)会极大地影响学习算法的性能,但通常被视为事后经验。最近的实证工作突显了这种二流处理HPO的风险。它们表明基于选择超参数子空间进行搜索的不一致的性能结果是ML研究的一个普遍问题。在比较两个算法时,J和K搜索一个子空间可以得出J优于K的结论,而搜索另一个小空间则会产生相反的结果。简言之,你选择超参数可以欺骗你。我们为先前的这项工作提供了理论补充:我们用分析来定性这一问题,我们称之为超参数欺骗,并表明电网搜索本质上是欺骗性的。我们证明有防范欺骗的保证,并在实践中证明有防御。