具有化学应用的代数机器学习 (Algebraic Machine Learning with an Application to Chemistry)

As datasets used in scientific applications become more complex, studying the geometry and topology of data has become an increasingly prevalent part of the data analysis process. This can be seen for example with the growing interest in topological tools such as persistent homology. However, on the one hand, topological tools are inherently limited to providing only coarse information about the underlying space of the data. On the other hand, more geometric approaches rely predominately on the manifold hypothesis, which asserts that the underlying space is a smooth manifold. This assumption fails for many physical models where the underlying space contains singularities. In this paper we develop a machine learning pipeline that captures fine-grain geometric information without having to rely on any smoothness assumptions. Our approach involves working within the scope of algebraic geometry and algebraic varieties instead of differential geometry and smooth manifolds. In the setting of the variety hypothesis, the learning problem becomes to find the underlying variety using sample data. We cast this learning problem into a Maximum A Posteriori optimization problem which we solve in terms of an eigenvalue computation. Having found the underlying variety, we explore the use of Gr\"obner bases and numerical methods to reveal information about its geometry. In particular, we propose a heuristic for numerically detecting points lying near the singular locus of the underlying variety.

翻译：随着科学应用中所使用的数据集变得更加复杂,研究数据的几何学和地形学已成为数据分析过程日益普遍的一部分。例如,对持续同质学等地形学工具的兴趣日益浓厚,就可以看出这一点。然而,一方面,地形学工具本身仅局限于提供关于数据基础空间的粗化信息。另一方面,更多的几何方法主要依赖于多重假设,即基础空间是一个光滑的多元。对于许多物理模型,其中基础空间包含奇点,这一假设已经失效。在本文中,我们开发了一个机器学习管道,在不依赖任何平滑假设的情况下,捕捉精细重几何学信息。我们的方法涉及在代谢数几何几何和平滑的模型范围内工作。在设定多种假设时,学习问题主要取决于如何利用样本数据找到基本差异。我们把这个学习问题投入一个最深的假化优化问题,我们从一个电子价值计算的角度来解决这个问题。我们找到了基础的种类,我们探索了接近于精确度几何几何测底基点,我们提出了如何使用精确的数值基点。我们探索了如何测深地基点,我们如何测算。我们如何测测地基点,我们如何测算。