Supervised learning typically optimizes the expected value risk functional of the loss, but in many cases, we want to optimize for other risk functionals. In full-batch gradient descent, this is done by taking gradients of a risk functional of interest, such as the Conditional Value at Risk (CVaR) which ignores some quantile of extreme losses. However, deep learning must almost always use mini-batch gradient descent, and lack of unbiased estimators of various risk functionals make the right optimization procedure unclear. In this work, we introduce a meta-learning-based method of learning an interpretable mini-batch risk functional during model training, in a single shot. When optimizing for various risk functionals, the learned mini-batch risk functions lead to risk reduction of up to 10% over hand-engineered mini-batch risk functionals. Then in a setting where the right risk functional is unknown a priori, our method improves over baseline by 14% relative (~9% absolute). We analyze the learned mini-batch risk functionals at different points through training, and find that they learn a curriculum (including warm-up periods), and that their final form can be surprisingly different from the underlying risk functional that they optimize for.
翻译:受监督的学习通常优化了损失的预期价值风险功能,但在许多情况下,我们想优化其他风险功能。在完全的梯度下降中,我们采用风险功能的梯度来完成这项工作,如“风险有条件值 ” ( CVaR),它忽略了极端损失的四分位数。然而,深层次学习必须几乎总是使用微型批量梯度下降,而缺乏对各种风险功能的公正估计使正确优化程序变得模糊不清。在这项工作中,我们采用基于元学习的方法,学习一种在模型培训期间可解释的微型批量风险功能。在优化各种风险功能时,学习的微型批量风险功能导致风险减少高达10%,而手工制造的微型批量风险功能则忽略了一定的几分位数。在对正确风险功能不熟悉的情况下,我们的方法比基线改进了14%(~9%绝对值 ) 。我们通过培训来分析在不同点学习过的微型批量风险功能,发现他们从最优化的课程(包括功能基础时期)中学习了一种不同的风险。