The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian and softmax-kernels. In contrast to traditional RF approximations, parameters of these new methods can be optimized to reduce the variance of the approximation, and the optimum can be expressed in closed form. We show that our methods lead to variance reduction in practice ($e^{10}$-times smaller variance and beyond) and outperform previous methods in a kernel regression task. Using our proposed mechanism, we also present FAVOR#, a method for self-attention approximation in Transformers. We show that FAVOR# outperforms other random feature methods in speech modelling and natural language processing.
翻译:由高山或软式内核引发的线性操作员有效近似问题往往通过随机特征(RFs)来解决,这些特征能产生对操作员结果的不偏不倚近近。这些操作员出现在从内核方法到高效变压器等重要应用中。我们提出了参数化的、积极的、非硬计量的RFs,接近高山和软式内核。与传统的RF近近近近相比,这些新方法的参数可以优化,以减少近似的差异,而最佳的表达形式可以封闭。我们表明,我们的方法导致实践上的差异减少(10美元-美元-较少的差异及以后),并且超过了以前在内核回归任务中采用的方法。我们还提出FAVOR#,这是在变压器中自用近近法的一种方法。我们表明,FAVOR#在语音建模和自然语言处理中优于其他随机特征方法。