更紧密的变形宽度并非必然更好。关于执行、变形宽度研究和扩展的调查报告。 (Tighter Variational Bounds are Not Necessarily Better. A Research Report on Implementation, Ablation Study, and Extensions)

This report explains, implements and extends the works presented in "Tighter Variational Bounds are Not Necessarily Better" (T Rainforth et al., 2018). We provide theoretical and empirical evidence that increasing the number of importance samples $K$ in the importance weighted autoencoder (IWAE) (Burda et al., 2016) degrades the signal-to-noise ratio (SNR) of the gradient estimator in the inference network and thereby affecting the full learning process. In other words, even though increasing $K$ decreases the standard deviation of the gradients, it also reduces the magnitude of the true gradient faster, thereby increasing the relative variance of the gradient updates. Extensive experiments are performed to understand the importance of $K$. These experiments suggest that tighter variational bounds are beneficial for the generative network, whereas looser bounds are preferable for the inference network. With these insights, three methods are implemented and studied: the partially importance weighted autoencoder (PIWAE), the multiply importance weighted autoencoder (MIWAE) and the combination importance weighted autoencoder (CIWAE). Each of these three methods entails IWAE as a special case but employs the importance weights in different ways to ensure a higher SNR of the gradient estimators. In our research study and analysis, the efficacy of these algorithms is tested on multiple datasets such as MNIST and Omniglot. Finally, we demonstrate that the three presented IWAE variations are able to generate approximate posterior distributions that are much closer to the true posterior distribution than for the IWAE, while matching the performance of the IWAE generative network or potentially outperforming it in the case of PIWAE.

翻译：本报告解释、执行和扩展了在“更严格变换曲线并非必然更好”(T Rainforth等人,2018年)中展示的作品。我们提供了理论和经验证据,证明在加权自动涂层(IWAE)(Burda等人,2016年)的重要性中,增加重心样本数量(K$)会降低梯度估计网络的信号到噪音比率(SNR),从而影响整个学习过程。换句话说,即使增加美元会降低梯度的标准偏差,它也会更快地降低真正的梯度分布的幅度,从而增加梯度更新的相对差异。为了理解美元的重要性,进行了广泛的实验。这些实验表明,更紧密的变差界限有利于感化网络,而对于推断网络而言,更宽松的界限则更可取。有了这些洞察,我们应用并研究了三种方法:部分重要的加权自动涂层(PIWAEE), 加权自动涂层变差(MIWAE) 案例的大小增加值, 并且确保了我们不断升级的系统变压的系统质量, 而SWA(I) 最终显示,这些自我变压的精确的自我变压分析。

相关内容