Discovering a meaningful symbolic expression that explains experimental data is a fundamental challenge in many scientific fields. We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMed), which integrates scientific discipline wisdom in a scientist-in-the-loop approach, with state-of-the-art symbolic regression (SR) methods. SciMed combines a wrapper selection method, that is based on a genetic algorithm, with automatic machine learning and two levels of SR methods. We test SciMed on five configurations of a settling sphere, with and without aerodynamic non-linear drag force, and with excessive noise in the measurements. We show that SciMed is sufficiently robust to discover the correct physically meaningful symbolic expressions from the data, and demonstrate how the integration of domain knowledge enhances its performance. Our results indicate better performance on these tasks than the state-of-the-art SR software packages, even in cases where no knowledge is integrated. Moreover, we demonstrate how SciMed can alert the user about possible missing features, unlike the majority of current SR systems.
翻译:在许多科学领域,发现一个解释实验数据的有意义的象征性表达方式是一个根本性的挑战。我们提出了一个名为科学家-海洋赤道探测器(SciMed)的新颖、开放源码的计算框架,这个框架将科学学科的智慧结合到现场科学家的方法和最先进的象征性回归(SR)方法中。SciMed将基于基因算法、自动机器学习和两种SR方法的包装选择方法结合起来。我们测试了SciMed的五个沉淀球配置,有和没有空气动力的非线性阻力,以及测量中过度的噪音。我们显示,SciMed足够强大,能够从数据中发现正确的、具有实际意义的符号表达方式,并表明域知识的整合如何增进其性能。我们的结果显示,这些任务的业绩优于最先进的SR软件包,即使在没有知识融合的情况下。此外,我们演示SciMed如何提醒用户可能缺少的特征,这与目前大多数SR系统不同。