超越统计相似性:重新思考工程设计中深创模型的计量方法 (Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design)

Deep generative models, such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown great promise in a variety of applications, including image and speech synthesis, natural language processing, and drug discovery. However, when applied to engineering design problems, evaluating the performance of these models can be challenging, as traditional statistical metrics based on likelihood may not fully capture the requirements of engineering applications. This paper doubles as a review and a practical guide to evaluation metrics for deep generative models (DGMs) in engineering design. We first summarize well-accepted `classic' evaluation metrics for deep generative models grounded in machine learning theory and typical computer science applications. Using case studies, we then highlight why these metrics seldom translate well to design problems but see frequent use due to the lack of established alternatives. Next, we curate a set of design-specific metrics which have been proposed across different research communities and can be used for evaluating deep generative models. These metrics focus on unique requirements in design and engineering, such as constraint satisfaction, functional performance, novelty, and conditioning. We structure our review and discussion as a set of practical selection criteria and usage guidelines. Throughout our discussion, we apply the metrics to models trained on simple 2-dimensional example problems. Finally, to illustrate the selection process and classic usage of the presented metrics, we evaluate three deep generative models on a multifaceted bicycle frame design problem considering performance target achievement, design novelty, and geometric constraints. We publicly release the code for the datasets, models, and metrics used throughout the paper at decode.mit.edu/projects/metrics/.

翻译：深层基因模型,如Variational Autoencoders(VAE)、General Adversarial Networks(GANs)、Difulation Models(Difulations)和变异器等,在各种应用,包括图像和语音合成、自然语言处理和药物发现等方面,都显示出了巨大的希望。然而,在应用到工程设计问题时,评估这些模型的性能可能具有挑战性,因为基于可能性的传统统计指标可能无法充分反映工程应用的要求。本文是用于评估工程设计中深层基因模型(DGMS)的双倍审查和实用准则。我们首先总结了为人接受的“古典”评价指标,用于基于机器学习理论和典型计算机科学应用的深层基因模型。我们然后通过案例研究,强调为什么这些基准很少能很好地转化出设计问题,但由于缺乏固定的替代方法而经常被使用。我们根据不同研究界提出的一套特定设计计量标准,可用于评估深度基因变异模型。这些衡量标准侧重于设计和工程设计中的独特要求,我们在整个过程中,我们用一个经过训练的标准化标准讨论,我们用来解释的精确标准,最后的标准和标准,我们用来评估。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/