Exponential growth in the amount of data generated by the Internet of Things currently pose significant challenges for data communication, storage and analytics and leads to high costs for organisations hoping to leverage their data. Novel techniques are therefore needed to holistically improve the efficiency of data storage and analytics in IoT systems. The emerging compression technique Generalized Deduplication (GD) has been shown to deliver high compression and enable direct compressed data analytics with low storage and memory requirements. In this paper, we propose a new GD-based data compression algorithm called GreedyGD that is designed for analytics. Compared to existing versions of GD, GreedyGD enables more reliable analytics with less data, while running 11.2x faster and delivering even better compression.
翻译:物联网(IoT)所产生的数据量与日俱增,这旨在数据通讯、存储和分析方面对组织带来了巨大挑战,且使得组织希望利用数据时面临高额成本。因此需要新的技术来整体提高物联网系统中数据存储和分析的效率。新兴的压缩技术广义去重(GD)已被证明可提供高压缩率,且可以通过低存储和低内存要求直接进行压缩数据分析。本文提出了一种基于GD的新型数据压缩算法GreedyGD,旨在用于分析。与现有版本的GD相比,GreedyGD通过更少的数据实现更可靠的分析,同时运行速度更快(快11.2倍),提供更好的压缩率。