If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In other words, weakness is the pareto optimal choice of proxy. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.
翻译:$A 和 $B 的设定是 $A\ subset B$, 如果美元和 $B 的设定是 $A\ subset B$, 概括化可以被理解为 $A$ 的假设的推论。 人们可以推断出任何从$A 美元中得出的假设数量, 但其中只有某些假设可能概括为$B$。 人们如何知道哪些有可能概括化? 一种战略是选择最短的, 将压缩信息的能力与简化能力( 情报的代理) 等同起来。 换句话说, 我们从数学正式的定型概念的角度来研究这个问题。 我们发现压缩既不必要, 也不足以使绩效最大化( 假设的概率是泛化概率的概率 ) 。 我们制定一种与长度或简单性无关的假设, 被称为弱点。 我们表明,如果任务分布一致, 那么就没有任何选择的替代方法, 在所有任务中至少表现最短和最弱的最大化, 而至少是一个最差的替代方法。 换句话说, 在二元计算的最大弱点和最低描述中, 我们能够解释后一种最差的汇率。