This paper investigates the theoretical problem of maintaining linear separability of the data-generating distribution under linear compression. While it has been long known that linear separability may be maintained by linear transformations that approximately preserve the inner products between the domain points, the limit to which the inner products are preserved in order to maintain linear separability was unknown. In this paper, we show that linear separability is maintained as long as the distortion of the inner products is smaller than the squared margin of the original data-generating distribution. The proof is mainly based on the geometry of hard support vector machines (SVM) extended from the finite set of training examples to the (possibly) infinite domain of the data-generating distribution. As applications, we derive bounds on the (i) compression length of random sub-Gaussian matrices; and (ii) generalization error for compressive learning with hard-SVM.
翻译:本文探讨了在线性压缩下维持数据生成分布线性分离的理论问题。虽然人们早已知道线性分离可以通过线性变换得以维持,这种变换大致上保留了域点之间的内产物,但内产物保留到维持线性分离的限度尚不得而知。本文显示,只要内产物的扭曲小于原始数据生成分布的平方差,线性分离就得以维持。证据主要基于硬性支持矢量机的几何学,从有限的培训范例扩大到数据生成分布的(可能)无限领域。作为应用,我们从(一) 随机子高加索矩阵的压缩长度上得出界限;以及(二) 硬性SVM压缩学习的一般错误。