This paper revisits an incredibly simple yet exceedingly effective computing paradigm, Deep Mutual Learning (DML). We observe that the effectiveness correlates highly to its excellent generalization quality. In the paper, we interpret the performance improvement with DML from a novel perspective that it is roughly an approximate Bayesian posterior sampling procedure. This also establishes the foundation for applying the R\'{e}nyi divergence to improve the original DML, as it brings in the variance control of the prior (in the context of DML). Therefore, we propose R\'{e}nyi Divergence Deep Mutual Learning (RDML). Our empirical results represent the advantage of the marriage of DML and the R\'{e}nyi divergence. The flexible control imposed by the R\'{e}nyi divergence is able to further improve DML to learn better generalized models.
翻译:本文回顾了一个极其简单但极为有效的计算模式,“深相互学习”(DML)。我们发现,效果与其极佳的普及质量密切相关。在文件中,我们从一种新颖的角度解释与DML的性能改进,即它大致是一个贝叶西亚后方取样程序。这也为应用R\'{e}nyi差异来改进原来的DML奠定了基础,因为它对先前(在DML的情况下)的差异控制带来了差异。因此,我们提议R\'{e}nyi difference 深层相互学习(RDML)。我们的经验结果代表了DML的结合和R\'{e}nyi差异的优势。R\'{e}nyi差异所强加的灵活控制能够进一步改进DML,以学习更好的通用模式。