ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

我们很高兴宣布ICLR 2022杰出论文奖得主!

Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models

By Fan Bao, Chongxuan Li, Jun Zhu, Bo Zhang

Defusion probabilistic model (DPM), a class of powerful generative models, is a rapidly growing topic in machine learning. This paper aims to tackle the inherent limitation of the DPM models, which is the slow and expensive computation of the optimal reverse variance in DPMs. The authors first present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms with respect to its score function. Then they propose Analytic-DPM, a novel and elegant training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. This paper is significant both in terms of its theoretical contribution (showing that both the optimal reverse variance and KL divergence of a DPM have analytic forms) and its practical benefit (presenting a training-free inference applicable to various DPM models), and will likely influence future research on DPMs.

This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST)

Hyperparameter Tuning with Renyi Differential Privacy

By Nicolas Papernot, Thomas Steinke

This paper provides new insights into an important blind spot of most of the prior analyses of the differential privacy of learning algorithms, namely the fact that the learning algorithm is run multiple times over the data in order to tune the hyperparameters. The authors show that there are situations in which part of the data can skew the optimal hyperparameters, henceforth leaking private information. Furthermore, the authors provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. This is an excellent paper considering the everyday use of learning algorithms and its implications in terms of privacy for society, and proposing ways to address this issue. This work will provide the foundation for many follow-up works on differentially private machine learning algorithms.

*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).

Learning Strides in Convolutional Neural Networks

By Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

This paper addresses an important problem that anyone using convolutional networks has faced, namely setting the strides in a principled way as opposed to trials and errors. The authors propose a novel and very clever mathematical formulation for learning strides and demonstrate a practically useful method that achieves state-of-the-art experimental results in comprehensive benchmarks. The main idea is DiffStride, the first downsampling layer with learnable strides that allows one to learn the size of a cropping mask in the Fourier domain, effectively performing resizing in a way that is amenable to differentiable programming. This is an excellent paper that proposes a method that will likely be part of commonly used tool boxes as well as courses on deep learning.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Expressiveness and Approximation Properties of Graph Neural Networks

By Floris Geerts, Juan L Reutter

This elegant theoretical paper shows how questions regarding the expressiveness and separability of different graph neural networks GNN architectures can be reduced to (and sometimes substantially simplified by) examining their computations in tensor language, where these questions connect to well-known combinatorial notions such as the treewidth. In particular, this paper provides an elegant way to easily obtain bounds on the separation power of GNNs in terms of the Weisfeiler-Leman (WL) tests, which have become the yardstick to measure the separation power of GNNs. The proposed framework also has implications for studying approximability of functions through GNNs. This paper has the potential to make a significant impact for future research by providing a general framework for describing, comparing and analyzing GNN architectures. In addition, this paper provides a toolbox with which GNN architecture designers can analyze the separation power of their GNNs, without needing to know the intricacies of the WL-tests.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Comparing Distributions by Measuring Differences that Affect Decision Making

By Shengjia Zhao, Abhishek Sinha, Yutong (Kelly) He, Aidan Perreault, Jiaming Song, Stefano Ermon

This paper proposes a new class of discrepancies that can compare two probability distributions based on the optimal loss for a decision task. By suitably choosing the decision task, the proposed method generalizes the Jensen-Shannon divergence and the maximum mean discrepancy family. The authors demonstrate that the proposed approach achieves superior test power compared to competitive baselines on various benchmarks, with compelling use cases for understanding the effects of climate change on different social and economic activities, evaluating sample quality, and selecting features targeting different decision tasks. Not only is the proposed method intellectually elegant, the committee finds that the paper is exceptional for its empirical significance, as the fact that the method allows a user to directly specify their preferences when comparing distributions through the decision loss implies an increased level of interpretability for practitioners.

*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST).

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

By X.Y. Han, Vardan Papyan, David L. Donoho

This paper presents new theoretical insights on the “neural collapse” phenomenon that occurs pervasively in today’s deep net training paradigm. During neural collapse, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Instead of the cross-entropy loss that is mathematically harder to analyze, the paper demonstrates a new decomposition of the mean squared error (MSE) loss in order to analyze each component of the loss under neural collapse, which in turn, leads to a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. Finally, by studying renormalized gradient flow along the central path, the authors derive exact dynamics that predict neural collapse. In sum, this paper provides novel and highly inspiring theoretical insights for understanding the empirical training dynamics of deep networks.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Bootstrapped Meta-Learning

By Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

Meta-learning, or learning to learn, has the potential to empower artificial intelligence, yet meta-optimization has been a considerable challenge to unlocking this potential. This paper opens a new direction in meta-learning, beautifully inspired from TD learning, that bootstraps the meta-learner from itself or another update rule. The theoretical analysis is thorough, and the empirical results are compelling, with a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. The committee believes that this paper will inspire a lot of people.

*This paper will be presented in the Oral Session 3 on Meta Learning and Adaptation on Apr 27 4pm GMT (9am PST).

成为VIP会员查看完整内容