学习率论文 - 专知

会员服务 ·

学习率

ADP-VRSGP: Decentralized Learning with Adaptive Differential Privacy via Variance-Reduced Stochastic Gradient Push

Arxiv

0+阅读 · 10月23日

MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning

Arxiv

0+阅读 · 10月20日

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling

Arxiv

0+阅读 · 10月16日

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

Arxiv

0+阅读 · 10月13日

Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

Arxiv

0+阅读 · 10月13日

AutoGD: Automatic Learning Rate Selection for Gradient Descent

Arxiv

0+阅读 · 10月10日

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Arxiv

0+阅读 · 10月13日

Convergence of two-timescale gradient descent ascent dynamics: finite-dimensional and mean-field perspectives

Arxiv

0+阅读 · 10月10日

Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach

Arxiv

0+阅读 · 10月8日

Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable

Arxiv

0+阅读 · 10月5日

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

Arxiv

0+阅读 · 10月6日

Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets

Arxiv

0+阅读 · 10月5日

Why Do We Need Warm-up? A Theoretical Perspective

Why Do We Need Warm-up? A Theoretical Perspective

Arxiv

0+阅读 · 10月3日

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Arxiv

0+阅读 · 9月27日

Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws

Arxiv

0+阅读 · 9月24日

参考链接

微信扫码咨询专知VIP会员