This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In this paper, we discuss two points: firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized assuming a fixed variance. This recommendation is often not used in practice. We experimentally demonstrate how essential this step is. We also examine if keeping the mean estimate fixed after the warm-up leads to different results than estimating both the mean and the variance simultaneously after the warm-up. We do not observe a substantial difference. Secondly, we propose a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
翻译:本文的重点是最佳实施平均差异估计网络(MVE 网络)(Nix和Weigend,1994年)。这种网络往往被用作回归环境下不确定性估计方法的基石,例如Count 辍学(Gal等人,2017年)和Deep Ensembles(Lakshminarayanan等人,2017年))。具体地说,MVE网络假设数据是从正常分布中生成的,具有平均功能和差异功能。MVE网络输出一种平均值和差异估计数,并通过尽量减少负差来优化网络参数。在本文件中,我们讨论两点:首先,根据原作者关于应该使用暖化期的建议(Gal等人,2017年)和Deep Engesembles(Lakshminarayan等人,2017年),最近工作中报告的趋同趋同困难比较容易被阻止。在此期间,只有最优化的平均值假定固定差异通常不会在实践中使用。我们实验性地证明这一步骤有多重要。我们还要研究在热后固定的平均值估计是否会导致不同的结果,而不是同时估计在暖化后的平均数和新变迁之后出现的差异。我们没有提出新的变化基准。我们没有看到一个实质性的改变。我们没有看到新的网络。