PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
翻译:PySR是一个开源库,用于实用的符号回归,这是一种旨在发现人类可解释的符号模型的机器学习方法。 PySR的目标是通过高性能的分布式后端、灵活的搜索算法和与几个深度学习包的接口来将符号回归民主化和普及到科学领域。 PySR的内部搜索算法是一种多种群进化算法,由独特的进化-简化-优化循环组成,旨在优化新发现的经验表达式中的未知标量常数。 PySR的后端是极其优化的Julia库SymbolicRegression.jl,它可以直接从Julia中使用。它能够将用户定义的运算符在运行时融合到SIMD内核中,执行自动微分,并将表达式的人口分布到集群上的数千个核心。在描述这个软件的同时,我们还引入了一个新的基准测试,"EmpiricalBench",来量化符号回归算法在科学中的适用性。这个基准测试衡量了从原始和合成数据集中恢复历史经验方程的能力。