Symbolic regression, the task of extracting mathematical expressions from the observed data $\{ \vx_i, y_i \}$, plays a crucial role in scientific discovery. Despite the promising performance of existing methods, most of them conduct symbolic regression in an \textit{offline} setting. That is, they treat the observed data points as given ones that are simply sampled from uniform distributions without exploring the expressive potential of data. However, for real-world scientific problems, the data used for symbolic regression are usually actively obtained by doing experiments, which is an \textit{online} setting. Thus, how to obtain informative data that can facilitate the symbolic regression process is an important problem that remains challenging. In this paper, we propose QUOSR, a \textbf{qu}ery-based framework for \textbf{o}nline \textbf{s}ymbolic \textbf{r}egression that can automatically obtain informative data in an iterative manner. Specifically, at each step, QUOSR receives historical data points, generates new $\vx$, and then queries the symbolic expression to get the corresponding $y$, where the $(\vx, y)$ serves as new data points. This process repeats until the maximum number of query steps is reached. To make the generated data points informative, we implement the framework with a neural network and train it by maximizing the mutual information between generated data points and the target expression. Through comprehensive experiments, we show that QUOSR can facilitate modern symbolic regression methods by generating informative data.
翻译:符号回归, 从观察到的数据中提取数学表达式的任务 $ ⁇ \ vx_i, y_ ⁇ ⁇, 在科学发现中扮演着关键角色。 尽管现有方法表现良好, 但大多数方法在\ textit{offline} 设置中进行象征性回归。 也就是说, 它们将观察到的数据点作为简单的从统一分布中抽样的数据点, 而没有探索数据的表达潜力。 但是, 对于现实世界的科学问题, 用于符号回归的数据通常通过实验来积极获取, 这是一种设置。 因此, 如何获得有助于符号回归进程的信息化数据, 这仍然是一个重要的问题。 在本文中, 我们提议 QUOSR, 一个基于\ textb{ { { unline\ textbfs} f}}}}} s} 的框架, 仅仅从统一分布中抽样, 而没有探索数据的表达方式。 以迭代方式自动获得信息数据。 具体地说, QOSR接收历史数据点, 产生新的 $ 和 美元 的 重复数据 。