Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a "good model" is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad ("non-stochastic") data appear "in real life"? Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve. In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory.
翻译:解析统计有两个不同的( 几乎是正方的) 动机。 从哲学的角度来看, 它试图正式确定统计是如何工作的, 以及为什么某些统计模型比其他模型更好。 在引入了“ 好模型” 的概念之后, 自然产生一个问题: 有可能某部分数据没有好模型? 如果是这样, 这些坏( “ 非随机” 数据经常出现在“ 真实生活中 ”? 另外, 更多的技术动机来自算法信息理论。 在这个理论中, 引入了一个有限对象( 指此对象中的信息数量 ) 的复杂概念; 它给每个对象指定了一个数字, 称之为其算法复杂性( 或 科尔莫戈罗夫 复杂 ) 。 算术统计提供了更细微的分类: 对于每一个特定对象, 某些曲线被定义了它的行为特征。 事实证明, 几个不同的定义给出了( 大约) 相同的曲线。 在本次调查中, 我们试图对一个领域的主要结果( 包括最重要的证据) 进行解析, 以及一些历史评论。 我们假设, 读者的理论是熟悉了 。