The recursive and hierarchical structure of full rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is not a random variable; as such, model selection to avoid overfitting becomes problematic. A method to solve this problem is to assume a prior distribution on the full rooted trees. This enables the optimal model selection based on the Bayes decision theory. For example, by assigning a low prior probability to a complex model, the maximum a posteriori estimator prevents the selection of the complex one. Furthermore, we can average all the models weighted by their posteriors. In this paper, we propose a probability distribution on a set of full rooted trees. Its parametric representation is suitable for calculating the properties of our distribution using recursive functions, such as the mode, expectation, and posterior distribution. Although such distributions have been proposed in previous studies, they are only applicable to specific applications. Therefore, we extract their mathematically essential components and derive new generalized methods to calculate the expectation, posterior distribution, etc.
翻译:完全根植树的递归和等级结构适用于在诸如数据压缩、图像处理和机器学习等不同领域代表统计模型。 在多数情况下,完全根植树不是随机的变量;因此,为了避免过度装配而选择模型会产生问题。 解决这个问题的方法是假定以前在完全根植树上进行分配。 这使基于贝耶斯决定理论的最佳模型选择成为可能。 例如, 通过给复杂模型分配一个低的先前概率, 最高后世估计器无法选择复杂的模型。 此外, 我们可以用其后世加权的所有模型进行平均。 在本文中, 我们提议在一组完全根植树上进行概率分布。 它的参数表示法适合使用循环函数( 如模式、 期望 和后世分布 ) 来计算我们分布的属性。 虽然在以前的研究中已经提出过这种分配方法, 但是它们只适用于具体的应用。 因此, 我们提取它们的数学基本组成部分, 并产生新的通用方法来计算预期值、 后世分布等 。