We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi and Recht, 2007) or (recently introduced in the context of linear-attention Transformers) positive random features (Choromanski et al., 2021). By generalizing Bochner's Theorem for softmax/Gaussian kernels and leveraging random features for compositional kernels, the HRF-mechanism provides strong theoretical guarantees - unbiased approximation and strictly smaller worst-case relative errors than its counterparts. We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems.
翻译:我们建议采用一种新的随机特有方法,将软麦和高森内核线直线化,称为混合随机特有,自动调整内核估计的质量,以便在确定的利益区域提供最准确的近似值。特别紧急反应导致众所周知的方法,如三角测量法(Rahimi和Recht,2007年)或(最近在线性注意变压器背景下引入)正随机特有(Choromanski等人,2021年)。通过对软麦/加西南内核的博克纳理论进行概括化,并利用随机特性来为构型内核提供最准确的近似值。HRF机制提供了强有力的理论保证――没有偏见的近似值和严格地小于其对应方的最坏的相对错误。我们从点心内核估计实验,通过对数据承认集成结构的测试,对隐性注意变压器(也用于下游机器人的应用)进行基准化,在广泛的机器学习问题中展示其质量。