Many real-world systems can be described by mathematical formulas that are human-comprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradient-free evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradient-based optimization algorithms. We propose a novel neural network-based symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the anti-lock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.
翻译:许多真实世界的系统可以用数学公式来描述,数学公式是人类可以理解的,易于分析,而且有助于解释系统的行为。符号回归是一种方法,它从分析表达式中的数据中生成非线性模型。历史上,象征性回归主要是利用基因编程实现的。从历史上看,象征性回归主要是利用基因编程实现的。这种方法迭代地形成由基因操作者交叉和突变抽样的候选解决方案群。这种无梯度的进化方法存在若干缺陷:它与培训数据中的变量和样本数量不相称,模型的大小和复杂性往往在没有足够准确性增益的情况下增长,并且很难用仅仅利用基因操作者来微调内部模型系数系数系数。最近,神经网络被应用来学习整个解析公式,即其结构以及系数,通过基于梯度的优化算法,我们提出了一个新的基于神经网络的象征性回归模型,根据有限的培训数据和对系统先前的反知识构建的模型。该方法使用适应性加权加权加权模型来有效地处理内部模型系数系数系数系数系数系数系数系数,我们用多种损失函数选择了整个系统,然后选择了整个系统的方法,然后选择整个周期学习方法。我们用整个系统来显示整个周期学习过程,然后选择整个周期学习过程,然后选择整个周期的方法,然后显示整个周期学习过程。我们选择了整个过程。我们选择了整个系统,然后选择了整个周期方法,然后选择了整个周期学习过程。我们选择了整个方法,然后选择了整个学习的方法,然后选择了整个方法,然后选择了整个方法,然后选择了整个过程。