Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from biological subjects. However, dimensionality reduction techniques are limited by the von-Neumann architectures that they execute on. Specifically, data intensive algorithms such as dimensionality reduction techniques often require fast, high capacity, persistent memory which historically hardware has been unable to provide at the same time. In this paper, we present a re-implementation of an existing dimensionality reduction technique called Geometric Multi-Scale Resolution Analysis (GMRA) which has been accelerated via novel persistent memory technology called Memory Centric Active Storage (MCAS). Our implementation uses a specialized version of MCAS called PyMM that provides native support for Python datatypes including NumPy arrays and PyTorch tensors. We compare our PyMM implementation against a DRAM implementation, and show that when data fits in DRAM, PyMM offers competitive runtimes. When data does not fit in DRAM, our PyMM implementation is still able to process the data.
翻译:降低维度算法是研究人员工具箱的标准工具。 减少维度算法经常被用来增加下游任务, 如机器学习、数据科学, 也是理解复杂现象的探索方法。 例如, 生物学和神经科学通常使用减少维度法来理解从生物主体收集的数据。 然而, 减少维度技术受到其执行的冯- 纽曼结构的限制。 具体地说, 诸如减少维度技术等数据密集型算法往往需要快速、 高容量、 耐久性记忆, 而历史上硬件一直无法同时提供的。 在本文中, 我们展示了一种叫作几何多层分辨率分析(GMRA)的现有减少维度技术的重新实施。 这种方法已经通过新型的持久记忆技术( 记忆中心存储( MCAS) ) 加速。 我们的实施使用了一种名为 Python 数据类型( 包括 NumPy- PyTorch ) 和 PyTorors 等本地数据支持。 我们比较了我们的PyMM 与 DRAM 运行时不具有竞争力的数据进程。