Learning and compression are driven by the common aim of identifying and exploiting statistical regularities in data, which opens the door for fertile collaboration between these areas. A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding, which provides strong guarantees for compression of small datasets - in contrast with more popular estimators whose guarantees hold only in the asymptotic limit. Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains heuristic PAC learning when applied to a wide variety of models. Furthermore, we show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.
翻译:学习和压缩的驱动因素是确定和利用数据统计规律的共同目标,这为这些领域之间开展肥沃合作打开了大门。有希望的学习情景压缩技术组群的正常化最大可能性(NML)编码,为压缩小数据集提供了强有力的保障----而更受欢迎的估计者则只在无药可救的限度内提供保障。这里我们考虑以NML为基础的决定战略,解决受监督的分类问题,并表明在应用到范围广泛的各种模型时,它获得了超自然PAC的学习。此外,我们表明,我们方法的错误分类率受最大渗漏的束缚,这是最近提出的用于量化隐私敏感情景中数据渗漏可能性的衡量标准。