Defining and accurately measuring generalization in generative models remains an ongoing challenge and a topic of active research within the machine learning community. This is in contrast to discriminative models, where there is a clear definition of generalization, i.e., the model's classification accuracy when faced with unseen data. In this work, we construct a simple and unambiguous approach to evaluate the generalization capabilities of generative models. Using the sample-based generalization metrics proposed here, any generative model, from state-of-the-art classical generative models such as GANs to quantum models such as Quantum Circuit Born Machines, can be evaluated on the same ground on a concrete well-defined framework. In contrast to other sample-based metrics for probing generalization, we leverage constrained optimization problems (e.g., cardinality constrained problems) and use these discrete datasets to define specific metrics capable of unambiguously measuring the quality of the samples and the model's generalization capabilities for generating data beyond the training set but still within the valid solution space. Additionally, our metrics can diagnose trainability issues such as mode collapse and overfitting, as we illustrate when comparing GANs to quantum-inspired models built out of tensor networks. Our simulation results show that our quantum-inspired models have up to a $68 \times$ enhancement in generating unseen unique and valid samples compared to GANs, and a ratio of 61:2 for generating samples with better quality than those observed in the training set. We foresee these metrics as valuable tools for rigorously defining practical quantum advantage in the domain of generative modeling.
翻译:界定和准确衡量基因模型的概括化仍然是一项持续的挑战,也是机器学习界内积极研究的一个专题。这与歧视性模型形成对照,在这种模型中,对一般化有明确的定义,即模型在面对隐蔽数据时的分类准确性。在这项工作中,我们设计了一个简单和毫不含糊的方法来评价基因模型的概括化能力。我们利用这里提出的基于抽样的概括化指标,从GANs等最先进的典型基因模型到Qaintum Hirm Rird Machines等量子模型的任何基因化模型,都可以在同一地点以具体的基因质量框架来评价。与其他基于样本的模型相比,在面对隐蔽数据时,我们利用这些离散的数据集来评估基因模型的概括性能力。 利用这些基于样本和模型的概括性能力来明确测量数据的质量,以便生成超出训练的数据集,但仍然在有价值的解决方案空间内。 此外,我们的衡量指标可以诊断培训问题,比如模式的崩溃和过度的精确度,在对GAN的模型进行对比时,我们用SAR-A的精确度模型来显示我们所建立的精确度模型的精确度的模型。