Experimental sciences have come to depend heavily on our ability to organize and interpret high-dimensional datasets. Natural laws, conservation principles, and inter-dependencies among observed variables yield geometric structure, with fewer degrees of freedom, on the dataset. We introduce the frameworks of semiclassical and microlocal analysis to data analysis and develop a novel, yet natural uncertainty principle for extracting fine-scale features of this geometric structure in data, crucially dependent on data-driven approximations to quantum mechanical processes underlying geometric optics. This leads to the first tractable algorithm for approximation of wave dynamics and geodesics on data manifolds with rigorous probabilistic convergence rates under the manifold hypothesis. We demonstrate our algorithm on real-world datasets, including an analysis of population mobility information during the COVID-19 pandemic to achieve four-fold improvement in dimensionality reduction over existing state-of-the-art and reveal anomalous behavior exhibited by less than 1.2% of the entire dataset. Our work initiates the study of data-driven quantum dynamics for analyzing datasets, and we outline several future directions for research.
翻译:我们引入了半古典和微观地方分析框架,用于数据分析,并开发了一种创新的、但自然的不确定性原则,用于提取数据中这一几何结构的精细特征,这在关键程度上依赖于数据驱动的近似值以量子机械过程为基础进行几何光学。这导致在多重假设下对具有严格概率汇合率的数据元体进行波动动态近似和大地测量学的首次可移植算法,我们用真实世界数据集展示了我们的算法,包括对COVID-19大流行期间人口流动信息的分析,以便在现有状态的维度下降方面实现四倍的改进,并揭示整个数据集不到1.2%所展示的异常行为。我们的工作启动了数据驱动量子动态研究,用于分析数据集,我们概述了未来研究的方向。