Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g., Rademacher sketches), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to improve random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets.
翻译:计算机视觉、自然语言处理和推荐系统等应用中至关重要的输入特性。 我们为改进圆点产品内核随机特性近似效率做出了一些新贡献,使这些内核在大规模学习中更有用。 首先,我们用复杂估价随机特性,例如Rademacher和Gaussian 草图和TensorSRHT等机器学习中最广泛使用的内核,以模拟输入特性之间的相互作用,这对于计算机视觉、自然语言处理和建议系统等应用至关重要。 其次,我们提供理论分析,以了解影响圆点产品内核随机特性近似效率的因素,通过对差异进行封闭式表达,使这些内核内核在大规模学习中更加有用。 这些差异公式澄清了某些多数值的随机变异性(例如,Tensor-SRT)在使用复杂价值随机变异性(例如,使用这些变异性模型,我们用这些变异性模型来评估这些变异性,这些变异性模型的变异性也比其他的变异性(例如,使用这些变异性模型的变异性模型),我们用这些变异性模型来评估这些变性特性来评估。