Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain.
翻译:解析错误或误差组可能与数据的其他部分分离。 校正 AI 系统的能力也开启了攻击它的可能性, 高维的内分解引发了由相同的内分解性导致的脆弱性。 在广义假设下的高维数据集中, 每一个点都可以通过简单和稳健的Fisher的对立词( 即 Fisher 的对立词 ) 与数据集的其余部分分离。 错误或误差组群可以与数据的其他部分分离。 校正系统的能力也开启了攻击它的可能性, 而高维度的内分解性也引发了由相同的内分性内分性导致的脆弱性。 在高维数据驱动的AI中, 管理错误和分析的偏差和适应性基础。 数据在重要分类中, 正在使用这些序列的精度和大脑的精度变异性 。 这些序列的序列和大脑变异性 的序列分布 。 用于重要分类的序列中, 这些序列中, 和大脑变异性数据的序列中, 可以使用这些序列中的精度分布 。 。 这些序列中, 和大脑变异性数据在重要分类中, 的序列中可以使用这些序列的分布 。