In this work we propose a principled evaluation framework for model-based optimisation to measure how well a generative model can extrapolate. We achieve this by interpreting the training and validation splits as draws from their respective `truncated' ground truth distributions, where examples in the validation set contain scores much larger than those in the training set. Model selection is performed on the validation set for some prescribed validation metric. A major research question however is in determining what validation metric correlates best with the expected value of generated candidates with respect to the ground truth oracle; work towards answering this question can translate to large economic gains since it is expensive to evaluate the ground truth oracle in the real world. We compare various validation metrics for generative adversarial networks using our framework. We also discuss limitations with our framework with respect to existing datasets and how progress can be made to mitigate them.
翻译:在这项工作中,我们提出了一个基于模型的最佳化原则评价框架,以衡量一种基因模型能够推断出多大的优点。我们通过解释培训和验证的分解,从它们各自的“连续的”地面真相分布中得出。验证数据集中的例子比培训数据集中的例子要大得多。示范选择是在某种规定的验证指标的验证套件上进行的。然而,一个重要的研究问题是,确定哪些验证指标与所产生候选人的预期价值在地面真相或奇迹方面的最佳关联;努力解决这一问题可以转化为巨大的经济收益,因为评估真实世界中的地面真相或奇迹费用昂贵。我们用我们的框架比较了各种关于基因对抗网络的验证基准。我们还讨论了在现有的数据集方面我们的框架的局限性以及如何取得进展来减轻这些局限性。