Softmax论文 - 专知

会员服务 ·

Softmax

When LRP Diverges from Leave-One-Out in Transformers

Arxiv

0+阅读 · 10月21日

Limitations of Normalization in Attention Mechanism

Arxiv

0+阅读 · 10月20日

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference

Arxiv

0+阅读 · 10月20日

Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel

Arxiv

0+阅读 · 10月14日

Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods

Arxiv

0+阅读 · 10月10日

Task-Level Insights from Eigenvalues across Sequence Models

Arxiv

0+阅读 · 10月10日

Paying Attention to Hybrid Attention: Untangling the Issues with Conversion Methods

Arxiv

0+阅读 · 10月7日

Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy

Arxiv

0+阅读 · 10月5日

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus

Arxiv

0+阅读 · 10月1日

An empirical study on the limitation of Transformers in program trace generation

Arxiv

0+阅读 · 9月29日

Scaling LLM Test-Time Compute with Mobile NPU on Smartphones

Arxiv

0+阅读 · 9月27日

Beyond Softmax: A Natural Parameterization for Categorical Random Variables

Arxiv

0+阅读 · 9月29日

Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity

Arxiv

0+阅读 · 6月18日

FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

Arxiv

0+阅读 · 4月22日

FLASH-D: FlashAttention with Hidden Softmax Division

Arxiv

0+阅读 · 5月20日

参考链接

微信扫码咨询专知VIP会员