OpenZL：一种基于图的压缩模型 (OpenZL: A Graph-Based Model for Compression)

Yann Collet,Nick Terrell,W. Felix Handte,Danielle Rozenblit,Victor Zhang,Kevin Zhang,Yaelle Goldschlag,Jennifer Lee,Elliot Gorokhovsky,Yonatan Komornik,Daniel Riegel,Stan Angelov,Nadav Rotem

Research techniques in the last decade have improved lossless compression ratios by significantly increasing processing time. These techniques have remained obscure because production systems require high throughput and low resource utilization. In practice, application-specific compression algorithms that leverage knowledge of the data structure and semantics are more popular. Application-specific compressor systems outperform even the best generic compressors, but these techniques have some drawbacks. Application-specific compressors are inherently limited in applicability, have high development costs, and are difficult to maintain and deploy. In this work, we show that these challenges can be overcome with a new compression strategy. We propose the "graph model" of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. OpenZL compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of tailored compressors with minimal code; its universal decoder eliminates deployment lag; and its investment in a well-vetted standard component library minimizes security risks. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents a significant advance in practical, scalable, and maintainable data compression for modern data-intensive applications.

翻译：过去十年的研究技术通过显著增加处理时间提高了无损压缩比，但这些技术因生产系统需要高吞吐量和低资源利用率而一直未被广泛应用。在实践中，利用数据结构和语义知识的应用特定压缩算法更为流行。应用特定压缩系统甚至优于最佳通用压缩器，但这些技术存在一些缺点：应用特定压缩器在适用性上固有受限，开发成本高，且难以维护和部署。本研究表明，这些挑战可通过新的压缩策略克服。我们提出压缩的“图模型”，这是一种将压缩表示为模块化编解码器有向无环图的新理论框架。OpenZL将数据压缩为自描述线格式，其任何配置均可由通用解码器解压。OpenZL的设计能以最少代码快速开发定制压缩器；其通用解码器消除了部署延迟；对经过充分验证的标准组件库的投入最小化了安全风险。实验结果表明，在多种真实数据集上，OpenZL相比最先进的通用压缩器实现了更优的压缩比和速度。Meta的内部部署也显示其在体积和/或速度上持续改进，开发周期从数月缩短至数日。因此，OpenZL代表了面向现代数据密集型应用的实际、可扩展且可维护数据压缩的重要进展。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日