CTR 预测中神经网络的Sparse Group Lasso 调适优化器 (Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction) - 专知论文

会员服务 ·

0

GROUP · 优化器 · AdaGrad · 正则化项 · 特化 ·

2021 年 7 月 30 日

Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

翻译：CTR 预测中神经网络的Sparse Group Lasso 调适优化器

Yun Yue,Yongchao Liu,Suo Tong,Minghao Li,Zhen Zhang,Chunyang Wen,Huanjun Bao,Lihong Gu,Jinjie Gu,Yixiang Mu

from arxiv, 24 pages

We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance.

翻译：我们开发了一个新的框架,将稀有群体的正规化者加入深层学习的适应性优化者大家庭,如Momentum、Adagrad、Adam、AMSGrad、AdaHessian等,并创建了新型优化者,称为Group Momentum、Group Adadad、Group Adam、Group AmSGrad和Group AdaHessian等。我们根据原始-双向方法,在沙沙发性锥形结构中建立了理论上经证明的趋同保证。我们评估了我们的新优化者在三种大型真实世界广告中与最先进的深层学习模型一起点击数据集的正规化效应。实验结果表明,与使用规模裁剪法的后处理程序的原始优化者相比,模型的性能可以在同一微粒度水平上大大改进。此外,与没有规模裁剪裁的个案相比,我们的方法可以达到极高的吸附性,而且效果要好得多或竞争性强得多。

0

相关内容

GROUP

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【经典书】Python机器学习——预测分析核心算法，361页pdf

专知会员服务

47+阅读 · 2021年6月8日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

社区分享 | Spark 玩转 TensorFlow 2.0

社区分享 | Spark 玩转 TensorFlow 2.0

TensorFlow

15+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Stock Index Prediction using Cointegration test and Quantile Loss

Arxiv

1+阅读 · 2021年9月29日

On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization

Arxiv

0+阅读 · 2021年9月29日

Learning to Superoptimize Real-world Programs

Arxiv

0+阅读 · 2021年9月28日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Towards Full-line Code Completion with Neural Language Models

Towards Full-line Code Completion with Neural Language Models

Arxiv

3+阅读 · 2020年9月18日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】Python机器学习——预测分析核心算法，361页pdf

专知会员服务

47+阅读 · 2021年6月8日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

社区分享 | Spark 玩转 TensorFlow 2.0

社区分享 | Spark 玩转 TensorFlow 2.0

TensorFlow

15+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Stock Index Prediction using Cointegration test and Quantile Loss

Arxiv

1+阅读 · 2021年9月29日

On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization

Arxiv

0+阅读 · 2021年9月29日

Learning to Superoptimize Real-world Programs

Arxiv

0+阅读 · 2021年9月28日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Towards Full-line Code Completion with Neural Language Models

Towards Full-line Code Completion with Neural Language Models

Arxiv

3+阅读 · 2020年9月18日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员