克罗内克分解提升稀疏自编码器的效率与可解释性 (Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders) - 专知论文

会员服务 ·

0

分解 · 稀疏 · 稀疏自编码 · 稀疏自编码器 · 自编码器 ·

Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders

翻译：克罗内克分解提升稀疏自编码器的效率与可解释性

Vadim Kurochkin,Yaroslav Aksenov,Daniil Laptev,Daniil Gavrilov,Nikita Balagansky

Sparse Autoencoders (SAEs) have demonstrated significant promise in interpreting the hidden states of language models by decomposing them into interpretable latent directions. However, training and interpreting SAEs at scale remains challenging, especially when large dictionary sizes are used. While decoders can leverage sparse-aware kernels for efficiency, encoders still require computationally intensive linear operations with large output dimensions. To address this, we propose KronSAE, a novel architecture that factorizes the latent representation via Kronecker product decomposition, drastically reducing memory and computational overhead. Furthermore, we introduce mAND, a differentiable activation function approximating the binary AND operation, which improves interpretability and performance in our factorized framework.

翻译：稀疏自编码器（SAEs）通过将语言模型的隐藏状态分解为可解释的潜在方向，在解释这些状态方面展现出巨大潜力。然而，大规模训练和解释SAEs仍然面临挑战，尤其是在使用大型字典时。尽管解码器可以利用稀疏感知内核来提高效率，但编码器仍需要执行输出维度较大的计算密集型线性运算。为解决这一问题，我们提出KronSAE，一种通过克罗内克积分解对潜在表示进行因式分解的新型架构，从而大幅降低内存和计算开销。此外，我们引入了mAND，一种近似二元AND运算的可微激活函数，该函数在我们的因式分解框架中提升了可解释性和性能。

0

相关内容

【CVPR2025】基于组合表示移植的图像编辑方法

【CVPR2025】基于组合表示移植的图像编辑方法

专知会员服务

8+阅读 · 4月5日

LLM驱动的指令遵循:进展，213页ppt

LLM驱动的指令遵循:进展，213页ppt

专知会员服务

70+阅读 · 2023年12月30日

EMNLP 2021 | 学习改写非自回归机器翻译的翻译结果

EMNLP 2021 | 学习改写非自回归机器翻译的翻译结果

专知会员服务

16+阅读 · 2021年12月25日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知

38+阅读 · 2020年9月30日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于图论方法的DNA序列编码研究

国家自然科学基金

2+阅读 · 2016年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

基于概率计算的大规模MIMO检测方法

国家自然科学基金

1+阅读 · 2015年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning

Arxiv

0+阅读 · 12月22日

Structured Language Generation Model: Loss Calibration and Formatted Decoding for Robust Structure Prediction and Knowledge Retrieval

Arxiv

0+阅读 · 12月22日

SAEs Are Good for Steering -- If You Select the Right Features

Arxiv

0+阅读 · 12月22日

ResSVD: Residual Compensated SVD for Large Language Model Compression

Arxiv

0+阅读 · 12月19日

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

Arxiv

0+阅读 · 12月18日

VIP会员

文章信息

相关主题

稀疏自编码

稀疏自编码器

相关VIP内容

【CVPR2025】基于组合表示移植的图像编辑方法

【CVPR2025】基于组合表示移植的图像编辑方法

专知会员服务

8+阅读 · 4月5日

LLM驱动的指令遵循:进展，213页ppt

LLM驱动的指令遵循:进展，213页ppt

专知会员服务

70+阅读 · 2023年12月30日

EMNLP 2021 | 学习改写非自回归机器翻译的翻译结果

EMNLP 2021 | 学习改写非自回归机器翻译的翻译结果

专知会员服务

16+阅读 · 2021年12月25日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知

38+阅读 · 2020年9月30日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

相关论文

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning

Arxiv

0+阅读 · 12月22日

Structured Language Generation Model: Loss Calibration and Formatted Decoding for Robust Structure Prediction and Knowledge Retrieval

Arxiv

0+阅读 · 12月22日

SAEs Are Good for Steering -- If You Select the Right Features

Arxiv

0+阅读 · 12月22日

ResSVD: Residual Compensated SVD for Large Language Model Compression

Arxiv

0+阅读 · 12月19日

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

Arxiv

0+阅读 · 12月18日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于图论方法的DNA序列编码研究

国家自然科学基金

2+阅读 · 2016年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

基于概率计算的大规模MIMO检测方法

国家自然科学基金

1+阅读 · 2015年12月31日

SHVC质量可伸缩视频编码的快速算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员