The analysis of high-dimensional sparse data is becoming increasingly popular in many important domains. However, real-world sparse tensors are challenging to process due to their irregular shapes and data distributions. We propose the Adaptive Linearized Tensor Order (ALTO) format, a novel mode-agnostic (general) representation that keeps neighboring nonzero elements in the multi-dimensional space close to each other in memory. To generate the indexing metadata, ALTO uses an adaptive bit encoding scheme that trades off index computations for lower memory usage and more effective use of memory bandwidth. Moreover, by decoupling its sparse representation from the irregular spatial distribution of nonzero elements, ALTO eliminates the workload imbalance and greatly reduces the synchronization overhead of tensor computations. As a result, the parallel performance of ALTO-based tensor operations becomes a function of their inherent data reuse. On a gamut of tensor datasets, ALTO outperforms an oracle that selects the best state-of-the-art format for each dataset, when used in key tensor decomposition operations. Specifically, ALTO achieves a geometric mean speedup of 8X over the best mode-agnostic (coordinate and hierarchical coordinate) formats, while delivering a geometric mean compression ratio of 4.3X relative to the best mode-specific (compressed sparse fiber) formats.
翻译:在许多重要领域,对高维稀少数据的分析越来越受欢迎。然而,现实世界稀少的沙粒体因其不规则的形状和数据分布而难以处理。我们建议采用适应性线性线性线性天线秩序(ALTO)格式,这是一种新型模式性(一般)代表,使多维空间的非零元素的平行性能在记忆中保持接近。为了生成索引元数据,ALTO采用了适应性比特编码办法,将指数计算方法与较低内存使用率和更有效地使用记忆带宽进行交换。此外,ALTO将其稀疏的代表性与非零元素的不规则空间分布脱钩,从而消除了工作量的不平衡,并大大降低了索尔计算同步的顶部。因此,基于ALTO的阵列运行的平行性功能成为其内在数据再利用的函数。在高压数据集组合上,ALTO在选择每个数据集的最佳状态和最有效使用的最新格式时,将它与非零度空间元素的空间分布。 具体性ALTO在最高级阵列点定位操作中,在AS-x平均的平地平地平面平面平面平面平面平面平面平压格式上,同时将一个最高级平压的平面的平面的平面的平面的平面的平面的平面结构格式(AS压压压式平压式平压式平压式平压式平压式平压式格式(x),同时将一个平压式平面的平面的平压式平压式平压式平压式平压式平压式平压式平压式的平压式。具体地压式对准地压式的平压式对准了。具体地压式对准地压式对准地压式对准地压式对准地压式对准地压式对准地压式对地压式对地压式对地压式对准了。