If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In other words, weakness is the pareto optimal choice of proxy. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.
翻译:$A 和 $B 的设置是 $A\ subset B $, 如果 $A 和 $B 的设置是 $A\ subset B$, 一般化可以被理解为 $A 美元 的假设的推论。 人们可以从$A 美元中推断出多少假设, 但其中只有部分假设可能概括为$B 美元。 人们如何知道哪些有可能概括化? 一种战略是选择最短的, 将压缩信息的能力与简单化的能力等同起来 。 换句话说, 我们从数学正式的定型认知学的角度来研究这个问题。 我们发现压缩既不必要, 也不足以使绩效最大化( 以假设的概率衡量 ) 。 我们可以推断出多少A, 与简单化无关的假设, 被称为弱点。 我们表明, 如果任务分布一致, 那么, 在所有任务中, 没有选择最小化的和最弱化的比喻, 而最弱化的替代选择是最佳的选择。 在实验中, 将最大弱点与最小化的描述值比值比值比重, 。</s>