Discovering a meaningful symbolic expression that explains experimental data is a fundamental challenge in many scientific fields. We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED), which integrates scientific discipline wisdom in a scientist-in-the-loop approach, with state-of-the-art symbolic regression (SR) methods. SciMED combines a wrapper selection method, that is based on a genetic algorithm, with automatic machine learning and two levels of SR methods. We test SciMED on five configurations of a settling sphere, with and without aerodynamic non-linear drag force, and with excessive noise in the measurements. We show that SciMED is sufficiently robust to discover the correct physically meaningful symbolic expressions from the data, and demonstrate how the integration of domain knowledge enhances its performance. Our results indicate better performance on these tasks than the state-of-the-art SR software packages , even in cases where no knowledge is integrated. Moreover, we demonstrate how SciMED can alert the user about possible missing features, unlike the majority of current SR systems.
翻译:在许多科学领域,发现一个解释实验数据的有意义的象征性表达方式是一个根本性的挑战。我们提出了一个名为科学家-海洋赤道探测器(SciMED)的新颖、开放源码的计算框架,这个框架将科学学科的智慧结合到“在流科学家”的方法中,与最先进的象征性回归(SR)方法相结合。SciMED将基于基因算法、自动机器学习和两种水平的SR方法的包装选择方法结合起来。我们测试了“SciMED”的5个沉积场配置,有和没有空气动力的非线性阻力,以及测量中过度的噪音。我们显示,SciMED足够强大,能够从数据中发现正确的具有物理意义的符号表达方式,并表明域知识的整合如何增进其性能。我们的结果表明,这些任务的业绩优于“最先进的SR”软件包,即使在没有知识融合的情况下。此外,我们演示SciMED如何提醒用户可能缺少的特征,不同于目前大多数SR系统。