VIP内容

文链接:https://arxiv.org/pdf/2009.14794.pdf

Performer 使用一个高效的(线性)广义注意力框架(generalized attention framework),允许基于不同相似性度量(核)的一类广泛的注意力机制。该框架通过谷歌的新算法 FAVOR+( Fast Attention Via Positive Orthogonal Random Features)来实现,后者能够提供注意力机制的可扩展低方差、无偏估计,这可以通过随机特征图分解(常规 softmax-attention)来表达。该方法在保持线性空间和时间复杂度的同时准确率也很有保证,也可以应用到独立的 softmax 运算。此外,该方法还可以和可逆层等其他技术进行互操作。

研究者表示,他们相信该研究为注意力、Transformer 架构和核方法提供了一种新的思维方式。

代码地址:https://github.com/google-research/google-research/tree/master/performer

论文公布之后,Youtube 知名深度学习频道 Yannic Kilcher 对该文章进行了解读。

成为VIP会员查看完整内容
0
33

热门内容

Deep neural networks have been able to outperform humans in some cases like image recognition and image classification. However, with the emergence of various novel categories, the ability to continuously widen the learning capability of such networks from limited samples, still remains a challenge. Techniques like Meta-Learning and/or few-shot learning showed promising results, where they can learn or generalize to a novel category/task based on prior knowledge. In this paper, we perform a study of the existing few-shot meta-learning techniques in the computer vision domain based on their method and evaluation metrics. We provide a taxonomy for the techniques and categorize them as data-augmentation, embedding, optimization and semantics based learning for few-shot, one-shot and zero-shot settings. We then describe the seminal work done in each category and discuss their approach towards solving the predicament of learning from few samples. Lastly we provide a comparison of these techniques on the commonly used benchmark datasets: Omniglot, and MiniImagenet, along with a discussion towards the future direction of improving the performance of these techniques towards the final goal of outperforming humans.

0
64
下载
预览

最新内容

In this letter, we analyze the performance of covert communications under faster-than-Nyquist (FTN) signaling in an additive white Gaussian noise channel. Both Neyman-Pearson criterion- and Kullback-Leibler (KL) divergence-based covertness constraints are considered. Especially, for KL divergence-based one, we prove that both the maximum transmit power and covert rate under FTN signaling are higher than those under Nyquist signaling. Numerical results coincide with our analysis and validate the advantages of FTN signaling to realize covert data transmission.

0
1
下载
预览

最新论文

In this letter, we analyze the performance of covert communications under faster-than-Nyquist (FTN) signaling in an additive white Gaussian noise channel. Both Neyman-Pearson criterion- and Kullback-Leibler (KL) divergence-based covertness constraints are considered. Especially, for KL divergence-based one, we prove that both the maximum transmit power and covert rate under FTN signaling are higher than those under Nyquist signaling. Numerical results coincide with our analysis and validate the advantages of FTN signaling to realize covert data transmission.

0
1
下载
预览
Top