There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. A widespread Deep Neural Networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). In a nutshell, we suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.
翻译:在物理学、生物学和其他自然科学中有许多问题,象征性回归可以提供宝贵的洞察力并发现新的自然法则。 广泛的深神经网络不能提供解释性解决方案。 同时, 符号表达方式给我们提供了观测和目标变量之间的明确关系。 然而, 目前, 象征性回归任务还没有主要解决方案, 我们的目标是缩小我们算法的这一差距。 在这项工作中, 我们提议了一个全新的深层次学习框架, 用于通过变异自动计算器( VAE) 生成符号表达式。 简而言之, 我们建议使用 VAE 生成数学表达式, 我们的培训战略力量生成公式以适应给定数据集。 我们的框架允许将公式的优先知识编码为快速检查的前提, 加速优化进程。 我们比较了我们的方法与现代的符号回归基准, 并表明我们的方法在噪音条件下超越了竞争者。 在Ngyuen数据集中, SEGVAE的回收率为65%, 比先前报告的SOTA高出20%。 我们证明这一数值取决于数据设置, 甚至可以更高。