There has been a massive increase in research interest towards applying data driven methods to problems in mechanics. While traditional machine learning (ML) methods have enabled many breakthroughs, they rely on the assumption that the training (observed) data and testing (unseen) data are independent and identically distributed (i.i.d). Thus, traditional ML approaches often break down when applied to real world mechanics problems with unknown test environments and data distribution shifts. In contrast, out-of-distribution (OOD) generalization assumes that the test data may shift (i.e., violate the i.i.d. assumption). To date, multiple methods have been proposed to improve the OOD generalization of ML methods. However, because of the lack of benchmark datasets for OOD regression problems, the efficiency of these OOD methods on regression problems, which dominate the mechanics field, remains unknown. To address this, we investigate the performance of OOD generalization methods for regression problems in mechanics. Specifically, we identify three OOD problems: covariate shift, mechanism shift, and sampling bias. For each problem, we create two benchmark examples that extend the Mechanical MNIST dataset collection, and we investigate the performance of popular OOD generalization methods on these mechanics-specific regression problems. Our numerical experiments show that in most cases, while the OOD generalization algorithms perform better compared to traditional ML methods on these OOD problems, there is a compelling need to develop more robust OOD generalization methods that are effective across multiple OOD scenarios. Overall, we expect that this study, as well as the associated open access benchmark datasets, will enable further development of OOD generalization methods for mechanics specific regression problems.
翻译:虽然传统的机器学习(ML)方法使许多突破得以实现,但它们所依赖的假设是,培训(观察)数据和测试(不见)数据是独立的,而且分布相同(i.d)。因此,传统的ML方法在应用于实际世界机械问题时,往往会分崩离析,其测试环境和数据分布变化不明。相比之下,分配(OOOD)外的概括化假设测试数据可能会改变(即,违反i.d.假设)。尽管传统机器学习(ML)方法已促成许多突破,但它们依赖于以下假设:培训(观察)数据和测试(不见)数据是独立的,而且分布(不见)数据是独立的。因此,传统的MLL方法对于回归问题往往缺乏基准数据集。为了解决这个问题,我们调查OOOD的通用方法的绩效变化,我们发现三个OODD问题:公开的变现、机制变换和取样偏差。对于每一个问题,我们提出两个基准示例是,将OOOOOOD(O-MOL)总体数据分析方法进行比较,而我们对这些总体的变校正方法进行一般的变现,我们一般的OD(ODADL)数据分析是比较的变化方法,我们对这些总的变化方法进行一般的变化方法进行。