The hierarchical and recursive expressive capability of rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. On the other hand, such hierarchical expressive capability causes a problem in tree selection to avoid overfitting. One unified approach to solve this is a Bayesian approach, on which the rooted tree is regarded as a random variable and a direct loss function can be assumed on the selected model or the predicted value for a new data point. However, all the previous studies on this approach are based on the probability distribution on full trees, to the best of our knowledge. In this paper, we propose a generalized probability distribution for any rooted trees in which only the maximum number of child nodes and the maximum depth are fixed. Furthermore, we derive recursive methods to evaluate the characteristics of the probability distribution without any approximations.
翻译:根树的分层和递归表达能力适用于代表各领域的统计模型,例如数据压缩、图像处理和机器学习。另一方面,这种分层表达能力在树的选择方面造成问题,以避免过度匹配。一种统一的方法是巴伊西亚办法,将根树视为随机变数,并可根据选定的模型或新数据点的预测值假设直接损失函数。然而,以前关于这个方法的所有研究都以全树的概率分布为基础,以我们所了解的最佳程度为基础。在本文件中,我们提出了任何根树的通用概率分布,其中只有最大数量的儿童节点和最大深度是固定的。此外,我们得出循环方法,在没有近似的情况下评估概率分布的特点。