Bayesian inference provides a uniquely rigorous approach to obtain principled justification for uncertainty in predictions, yet it is difficult to articulate suitably general prior belief in the machine learning context, where computational architectures are pure abstractions subject to frequent modifications by practitioners attempting to improve results. Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures that formalizes Occam's Razor; we prefer simple and sufficient explanations. Our universal hyperprior assigns plausibility to prior descriptions, encoded as sequences of symbols, by expanding on the core relationships between program length, Kolmogorov complexity, and Solomonoff's algorithmic probability. We then cast learning as information minimization over our composite change in belief when an architecture is specified, training data are observed, and model parameters are inferred. By distinguishing model complexity from prediction information, our framework also quantifies the phenomenon of memorization. Although our theory is general, it is most critical when datasets are limited, e.g. small or skewed. We develop novel algorithms for polynomial regression and random forests that are suitable for such data, as demonstrated by our experiments. Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation, thus addressing a fundamental challenge in how to efficiently obtain predictions from data.
翻译:贝叶斯推论提供了一种独特的严格方法,为预测中的不确定性找到原则性理由,然而,在机器学习背景下,很难阐明适当的一般一般先前信念,因为计算结构是纯粹的抽象,需要从业人员经常修改才能改进结果。 分解推论是一种信息理论的配方,用以推断任意的、正式化奥卡姆的拉泽尔的建筑;我们更倾向于简单和充分的解释。我们普遍超优先分配了先前描述的可信赖性,这些描述被编码为符号序列,方法是扩大程序长度、科尔莫戈罗夫复杂程度和所罗门霍夫的算法概率之间的核心关系。我们随后将学习作为信息,以最小化的方式,在指定一个架构、观察培训数据并推断模型参数时,我们综合的信仰变化。通过区分模型复杂性和预测信息,我们的框架也将记忆化现象量化。尽管我们的理论是一般性的,但当数据集有限时,例如小的或扭曲时,就最为关键。我们开发了多元回归和随机森林的新型算法,这样将数据与我们的精确的预测方法结合起来,通过我们所展示的精确的精确的精确的测算方法来测量数据。