We describe a measure quantization procedure i.e., an algorithm which finds the best approximation of a target probability law (and more generally signed finite variation measure) by a sum of Q Dirac masses (Q being the quantization parameter). The procedure is implemented by minimizing the statistical distance between the original measure and its quantized version; the distance is built from a negative definite kernel and, if necessary, can be computed on the fly and feed to a stochastic optimization algorithm (such as SGD, Adam, ...). We investigate theoretically the fundamental questions of existence of the optimal measure quantizer and identify what are the required kernel properties that guarantee suitable behavior. We test the procedure, called HEMQ, on several databases: multi-dimensional Gaussian mixtures, Wiener space cubature, Italian wine cultivars and the MNIST image database. The results indicate that the HEMQ algorithm is robust and versatile and, for the class of Huber-energy kernels, it matches the expected intuitive behavior.
翻译:我们描述一个计量量化程序,即一种算法,该算法通过QDirac质量总和(Q是量化参数)找到目标概率法(以及更普遍签署的有限变异度)的最佳近似值。该程序是通过将原计量与其量化版本之间的统计距离最小化而实施的;距离是从负确定内核建起的,必要时可以在飞上计算,并反馈到随机优化算法(如SGD, Adam,...)。我们从理论上调查了最佳计量量化器存在的基本问题,并确定了保证适当行为所需的内核特性。我们在若干数据库中测试了该程序,称为HEMQ:多维高斯混合物、维纳空间孵化剂、意大利葡萄酒栽培剂和MNIST图像数据库。结果显示,HEMQ算法是稳健和多功能的,对于Huber能源内核的类别来说,它与预期的直观行为相匹配。