Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. D4M.py implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamental functions and building blocks, and performance results which compare D4M.py's performance to D4M-MATLAB and D4M.jl.
翻译:Python已成为一种标准的科学计算语言,机器学习和数据分析模块的支持迅速增长,而且海量数据使用也越来越多。动态分布式多维数据模型(D4M)提供了一个高度可编译、统一的数据模型,其性能很强,能够快速高效地处理海量数据。在这项工作中,我们介绍了在Python实施D4M的D4M。D4M.py 执行D4M的所有基本功能,包括通过Greamulo提供Accumulo和SQL数据库支持。我们描述了数学背景和动机、对其基本功能和构件方法的解释,以及将D4M.py的性能与D4M-MATLAB和D4M.jl相比较的性能结果。