When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that current approaches to machine learning (including deep learning, or any formal-theoretic hybrid mix of top-down AI and statistical machine learning approaches), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every learning algorithm (with or without access to a formal theory), there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets can deceive learning algorithms into a ``simplicity bubble'' as likely as any other particular non-deceiving dataset. These deceiving datasets guarantee that any prediction effected by the learning algorithm will unpredictably diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution, although the latter is deemed a global one by the learning algorithm. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, and motivated by, the intrinsic power of algorithmic information theory and computability theory.
翻译:暂无翻译