基于注意力机制的 softmax 回归模型 (Attention Scheme Inspired Softmax Regression) - 专知论文

会员服务 ·

0

Softmax · 单元 · 注意力机制 · STOC · SODA ·

2023 年 4 月 20 日

Attention Scheme Inspired Softmax Regression

翻译：基于注意力机制的 softmax 回归模型

Yichuan Deng,Zhihang Li,Zhao Song

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we define a softmax regression problem. Formally speaking, given a matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b \in \mathbb{R}^n$, the goal is to use greedy type algorithm to solve \begin{align*} \min_{x} \| \langle \exp(Ax), {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2^2. \end{align*} In certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithm to train softmax function in practice.

翻译：大型语言模型 (LLMs) 对人类社会已经带来了巨大的变革。在 LLMs 中，softmax 单元是关键的计算之一。这个操作在 LLMs 中非常重要，因为它允许模型在给定输入单词序列的情况下生成可能的下一个单词或短语的分布。然后，该分布被用来选择最有可能的下一个单词或短语，基于模型分配的概率。softmax 单元在 LLMs 的训练中发挥着至关重要的作用，因为它允许模型通过调整神经网络的权重和偏置来从数据中学习。在凸优化领域，例如使用中心路径法来解决线性规划，在控制潜在函数的进展和稳定性方面，softmax 函数已被用作至关重要的工具 [Cohen、Lee 和 Song STOC 2019，Brand SODA 2020]。在本文中，受到 softmax 单元的启发，我们定义了一个 softmax 回归模型。具体而言，给定一个矩阵$ A \in \mathbb {R} ^ {n \times d}$ 和向量 $b \in \mathbb {R} ^ n$，目标是使用贪婪算法来解决\begin{align*} \min_{x} \| \langle \exp(Ax), {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2^2. \end{align*} 在某种意义上，我们的可证收敛性结果为什么我们可以在实践中使用贪婪算法来训练 softmax 函数提供了理论支持。

0

相关内容

Softmax

神经网络数学基础，45页ppt

神经网络数学基础，45页ppt

专知会员服务

83+阅读 · 2023年5月7日

【ICML2021】深入研究不平衡回归问题

专知会员服务

37+阅读 · 2021年6月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

基础 | 基于注意力机制的seq2seq网络

基础 | 基于注意力机制的seq2seq网络

黑龙江大学自然语言处理实验室

16+阅读 · 2018年3月7日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

基于注意力机制的图卷积网络

基于注意力机制的图卷积网络

科技创新与创业

73+阅读 · 2017年11月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

水稻miR408耐冷功能与作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于贝叶斯网模型的钢筋混凝土框剪结构抗震性能评估

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有荧光成像功能磁共振成像造影剂的合成及作为药物靶向制剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

低秩矩阵恢复的非凸松弛模型的理论与数值求解方法

国家自然科学基金

0+阅读 · 2012年12月31日

极小不可满足公式的结构与分类

国家自然科学基金

0+阅读 · 2012年12月31日

高维协变量下部分线性风险回归模型的变量选择

国家自然科学基金

0+阅读 · 2012年12月31日

混合贝叶斯网的概率推理

国家自然科学基金

5+阅读 · 2011年12月31日

Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks

Arxiv

0+阅读 · 2023年6月6日

ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data

Arxiv

0+阅读 · 2023年6月5日

Multiagent Rollout with Reshuffling for Warehouse Robots Path Planning

Arxiv

0+阅读 · 2023年6月3日

Coordination-free Multi-robot Path Planning for Congestion Reduction Using Topological Reasoning

Arxiv

0+阅读 · 2023年6月2日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2023年6月2日

Fast Nonlinear Vector Quantile Regression

Arxiv

0+阅读 · 2023年6月2日

ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment

Arxiv

0+阅读 · 2023年6月2日

A Yee-like finite-element scheme for Maxwell's equations on unstructured grids

Arxiv

0+阅读 · 2023年6月1日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

神经网络数学基础，45页ppt

神经网络数学基础，45页ppt

专知会员服务

83+阅读 · 2023年5月7日

【ICML2021】深入研究不平衡回归问题

专知会员服务

37+阅读 · 2021年6月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

基础 | 基于注意力机制的seq2seq网络

基础 | 基于注意力机制的seq2seq网络

黑龙江大学自然语言处理实验室

16+阅读 · 2018年3月7日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

基于注意力机制的图卷积网络

基于注意力机制的图卷积网络

科技创新与创业

73+阅读 · 2017年11月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks

Arxiv

0+阅读 · 2023年6月6日

ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data

Arxiv

0+阅读 · 2023年6月5日

Multiagent Rollout with Reshuffling for Warehouse Robots Path Planning

Arxiv

0+阅读 · 2023年6月3日

Coordination-free Multi-robot Path Planning for Congestion Reduction Using Topological Reasoning

Arxiv

0+阅读 · 2023年6月2日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2023年6月2日

Fast Nonlinear Vector Quantile Regression

Arxiv

0+阅读 · 2023年6月2日

ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment

Arxiv

0+阅读 · 2023年6月2日

A Yee-like finite-element scheme for Maxwell's equations on unstructured grids

Arxiv

0+阅读 · 2023年6月1日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

相关基金

水稻miR408耐冷功能与作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于贝叶斯网模型的钢筋混凝土框剪结构抗震性能评估

国家自然科学基金

0+阅读 · 2014年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有荧光成像功能磁共振成像造影剂的合成及作为药物靶向制剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

低秩矩阵恢复的非凸松弛模型的理论与数值求解方法

国家自然科学基金

0+阅读 · 2012年12月31日

极小不可满足公式的结构与分类

国家自然科学基金

0+阅读 · 2012年12月31日

高维协变量下部分线性风险回归模型的变量选择

国家自然科学基金

0+阅读 · 2012年12月31日

混合贝叶斯网的概率推理

国家自然科学基金

5+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员