The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness' of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
翻译:多重假设假设假设数据存在于或接近于一个未知的、内在层面低的多元体,是现代机器学习研究的主轴,然而,最近的工作表明,真实世界数据呈现出独特的非玩偶结构,即独一性,可能导致错误的结论。因此,发现这种独一性作为内推和推论任务的先导至关重要。我们通过开发一个(一) 量化本地内在层面的地貌框架和(二) 得出评估多尺度“自定义性”点的欧洲分数来解决这一问题。我们的方法是确定复杂空间的独一性,同时也捕捉到图像数据中的独一结构和本地几何复杂性。