Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate in less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge leverage scores are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.
翻译:通常情况下,机器学习模式尽量减少经验预期,结果,经过培训的模型通常对大多数数据效果良好,但在数据集密度较低的区域,性能可能恶化。这个问题也出现在基因模型中。基因模型可能会忽略经验数据分布中不太常见的代表性不足模式。这个问题被称为完全模式覆盖。我们提议基于脊脊杠杆评分的抽样程序,与标准方法相比,大大改进模式覆盖,并很容易与任何全球网络相结合。脊杠杆杠杆评分通过使用明确的地貌图进行计算,该图与GAN歧视者或预先培训的网络的下一至最后一层相关联,或者使用与Gausian内核相对应的隐含地貌图。对最近采用的全面模式覆盖方法的多项评价显示,在使用拟议的取样战略时,情况明显改善。