This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in i) manifolds, namely, hyperspheres and Grassmannians, and ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.
翻译:本文关注一般随机变量的流失压缩问题,具体而言,是率扭曲理论和随机变量的量化,在一般可测量空间(例如,元件和分形体)中采用数值。在数据科学中,如在压缩感测、机器学习、图像处理和手写数字识别中,工作结构十分普遍。Fractal 集在图像压缩和以太网流量建模中找到应用。我们的主要贡献是速度扭曲功能和量化错误的界限。这些界限非常笼统,基本上只需要存在符合小球概率等某些常规条件的参照措施。为了说明我们的结果的广泛适用性,我们特别将其用于随机变量,以(i) 元值(即超光谱和格拉斯曼人) 和(ii) 以迭代函数系统为特征的自我比较组,满足薄弱的分离属性。