Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate on less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge Leverage Scores (RLSs) are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.
翻译:通常情况下,机器学习模式尽量减少经验预期,因此,经过培训的模型通常对大多数数据效果良好,但在数据集密度较低的区域,性能可能恶化。这个问题也出现在基因模型中。基因模型可能会忽略经验数据分布中较少出现的代表性不足模式。这个问题被称为完全模式覆盖。我们建议采用基于脊脊杠杆评分的抽样程序,与标准方法相比,大大改进模式覆盖,并很容易与任何GAN组合在一起。Ridge Leverageage 评分(RLS)通过使用与GAN歧视者或预先培训的网络的下至最后一层有关的明确地貌图,或通过使用与Gausian内核相对的隐含地貌图来计算。对最近采用的全面模式覆盖方法进行的多项评价表明,在使用拟议的采样战略时有明显的改进。