The need for recognition/approximation of functions in terms of elementary functions/operations emerges in many areas of experimental mathematics, numerical analysis, computer algebra systems, model building, machine learning, approximation and data compression. One of the most underestimated methods is the symbolic regression. In the article, reductionist approach is applied, reducing full problem to constant functions, i.e, pure numbers (decimal, floating-point). However, existing solutions are plagued by lack of solid criteria distinguishing between random formula, matching approximately or literally decimal expansion and probable ''exact'' (the best) expression match in the sense of Occam's razor. In particular, convincing STOP criteria for search were never developed. In the article, such a criteria, working in statistical sense, are provided. Recognition process can be viewed as (1) enumeration of all formulas in order of increasing Kolmogorov complexity K (2) random process with appropriate statistical distribution (3) compression of a decimal string. All three approaches are remarkably consistent, and provide essentially the same limit for practical depth of search. Tested unique formulas count must not exceed 1/sigma, where sigma is relative numerical error of the target constant. Beyond that, further search is pointless, because, in the view of approach (1), number of equivalent expressions within error bounds grows exponentially; in view of (2), probability of random match approaches 1; in view of (3) compression ratio much smaller than 1.
翻译:在实验数学、数字分析、计算机代数系统、模型建设、机器学习、近似和数据压缩等许多领域,对基本功能/操作功能的功能的承认/认可/认可的必要性出现在实验数学、数字分析、计算机代数系统、模型建设、机器学习、近似和数据压缩等许多领域。最低估的方法之一是象征性回归。在文章中,应用了减缩主义方法,将全部问题降低到恒定函数,即纯数字(小数、浮点),但是,现有解决方案被缺乏明确标准所困扰,无法区分随机公式、匹配小数扩展约或小数扩展和奥卡姆剃刀含义中可能“exact”(最佳)表达式。特别是,从未制定令人信服的停止搜索标准。在文章中,提供了一种在统计意义上起作用的标准。承认过程可被视为:(1) 列出所有公式,以便增加纯数字(小数、浮点点点点点点点) (3) 压缩小号串。所有三种方法都非常一致,基本上为实际搜索深度提供了相同的限制。测试的独特公式数量不应超过1/西格玛(1.Grima)的比例比目标表达率(1),因为一个不变的概率直径直径直径直观,在1之后。