Probabilistic context-free grammars have a long-term record of use as generative models in machine learning and symbolic regression. When used for symbolic regression, they generate algebraic expressions. We define the latter as equivalence classes of strings derived by grammar and address the problem of calculating the probability of deriving a given expression with a given grammar. We show that the problem is undecidable in general. We then present specific grammars for generating linear, polynomial, and rational expressions, where algorithms for calculating the probability of a given expression exist. For those grammars, we design algorithms for calculating the exact probability and efficient approximation with arbitrary precision.
翻译:无环境概率语法在机器学习和符号回归中长期记录用作基因模型。当用于符号回归时,它们会产生代数表达式。我们将后者定义为语法衍生的等同字符类别,并解决用给定语法计算某一表达式的概率问题。我们显示,问题一般是不可分化的。然后我们提出用于生成线性、多元性和理性表达式的具体语法,其中存在计算某一表达式概率的算法。对于这些语法,我们设计算法,以任意精确计算准确概率和有效近似。