The performance of deep neural networks can be highly sensitive to the choice of a variety of meta-parameters, such as optimizer parameters and model hyperparameters. Tuning these well, however, often requires extensive and costly experimentation. Bayesian optimization (BO) is a principled approach to solve such expensive hyperparameter tuning problems efficiently. Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized. In this work, we consider the scenario where we have data from similar functions that allows us to specify a tighter distribution a priori. Specifically, we focus on the common but potentially costly task of tuning optimizer parameters for training neural networks. Building on the meta BO method from Wang et al. (2018), we develop practical improvements that (a) boost its performance by leveraging tuning results on multiple tasks without requiring observations for the same meta-parameter points across all tasks, and (b) retain its regret bound for a special case of our method. As a result, we provide a coherent BO solution for iterative optimization of continuous optimizer parameters. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.
翻译:深神经网络的性能对于选择各种元参数,例如优化参数和模型超光度计等,可能非常敏感,因为选择各种元参数(如优化参数和模型超光度计)时,可以高度敏感。但是,要完成这些测试,往往需要广泛和昂贵的实验。贝叶斯优化(BO)是高效解决如此昂贵的超光谱调问题的一项原则性方法。BO的性能表现关键在于说明和完善功能的分配,用于解释基本功能的优化选择。在这项工作中,我们考虑的情景是,我们拥有来自类似功能的数据,使我们能够事先规定更严格的分配。具体地说,我们侧重于调整神经网络的优化参数这一共同但可能更昂贵的任务。在Wang等人(2018年)的元BO方法的基础上,我们开发了实际的改进方法,以便(a)通过在多项任务上利用调整结果,而无需对所有任务中的同一元参数进行观察,以及(b)对于我们的方法的特殊情形,我们仍感到遗憾。结果是,我们为在持续优化的精度优化的精度参数参数参数方面提供了一致的解决办法。为了核实我们最接近现实的模型的模型的模型,在高分辨率的模型培训中,通过近的模型中,我们的数据序列中,通过采集的模型中,我们收集的模型的模型的模型的模型收集了我们的方法是用来改进了我们最精确的模型的模型,以显示最精确的模型,以显示最精确的模型的模型的模型的模型,以显示。