我们很高兴宣布ICLR 2022杰出论文奖得主!
By Fan Bao, Chongxuan Li, Jun Zhu, Bo Zhang
Defusion probabilistic model (DPM), a class of powerful generative models, is a rapidly growing topic in machine learning. This paper aims to tackle the inherent limitation of the DPM models, which is the slow and expensive computation of the optimal reverse variance in DPMs. The authors first present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms with respect to its score function. Then they propose Analytic-DPM, a novel and elegant training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. This paper is significant both in terms of its theoretical contribution (showing that both the optimal reverse variance and KL divergence of a DPM have analytic forms) and its practical benefit (presenting a training-free inference applicable to various DPM models), and will likely influence future research on DPMs.
This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST)
By Nicolas Papernot, Thomas Steinke
This paper provides new insights into an important blind spot of most of the prior analyses of the differential privacy of learning algorithms, namely the fact that the learning algorithm is run multiple times over the data in order to tune the hyperparameters. The authors show that there are situations in which part of the data can skew the optimal hyperparameters, henceforth leaking private information. Furthermore, the authors provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. This is an excellent paper considering the everyday use of learning algorithms and its implications in terms of privacy for society, and proposing ways to address this issue. This work will provide the foundation for many follow-up works on differentially private machine learning algorithms.
*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).
By Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour
This paper addresses an important problem that anyone using convolutional networks has faced, namely setting the strides in a principled way as opposed to trials and errors. The authors propose a novel and very clever mathematical formulation for learning strides and demonstrate a practically useful method that achieves state-of-the-art experimental results in comprehensive benchmarks. The main idea is DiffStride, the first downsampling layer with learnable strides that allows one to learn the size of a cropping mask in the Fourier domain, effectively performing resizing in a way that is amenable to differentiable programming. This is an excellent paper that proposes a method that will likely be part of commonly used tool boxes as well as courses on deep learning.
*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).
By Floris Geerts, Juan L Reutter
This elegant theoretical paper shows how questions regarding the expressiveness and separability of different graph neural networks GNN architectures can be reduced to (and sometimes substantially simplified by) examining their computations in tensor language, where these questions connect to well-known combinatorial notions such as the treewidth. In particular, this paper provides an elegant way to easily obtain bounds on the separation power of GNNs in terms of the Weisfeiler-Leman (WL) tests, which have become the yardstick to measure the separation power of GNNs. The proposed framework also has implications for studying approximability of functions through GNNs. This paper has the potential to make a significant impact for future research by providing a general framework for describing, comparing and analyzing GNN architectures. In addition, this paper provides a toolbox with which GNN architecture designers can analyze the separation power of their GNNs, without needing to know the intricacies of the WL-tests.
*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).
By Shengjia Zhao, Abhishek Sinha, Yutong (Kelly) He, Aidan Perreault, Jiaming Song, Stefano Ermon
This paper proposes a new class of discrepancies that can compare two probability distributions based on the optimal loss for a decision task. By suitably choosing the decision task, the proposed method generalizes the Jensen-Shannon divergence and the maximum mean discrepancy family. The authors demonstrate that the proposed approach achieves superior test power compared to competitive baselines on various benchmarks, with compelling use cases for understanding the effects of climate change on different social and economic activities, evaluating sample quality, and selecting features targeting different decision tasks. Not only is the proposed method intellectually elegant, the committee finds that the paper is exceptional for its empirical significance, as the fact that the method allows a user to directly specify their preferences when comparing distributions through the decision loss implies an increased level of interpretability for practitioners.
*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST).
By X.Y. Han, Vardan Papyan, David L. Donoho
This paper presents new theoretical insights on the “neural collapse” phenomenon that occurs pervasively in today’s deep net training paradigm. During neural collapse, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Instead of the cross-entropy loss that is mathematically harder to analyze, the paper demonstrates a new decomposition of the mean squared error (MSE) loss in order to analyze each component of the loss under neural collapse, which in turn, leads to a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. Finally, by studying renormalized gradient flow along the central path, the authors derive exact dynamics that predict neural collapse. In sum, this paper provides novel and highly inspiring theoretical insights for understanding the empirical training dynamics of deep networks.
*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).
By Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
Meta-learning, or learning to learn, has the potential to empower artificial intelligence, yet meta-optimization has been a considerable challenge to unlocking this potential. This paper opens a new direction in meta-learning, beautifully inspired from TD learning, that bootstraps the meta-learner from itself or another update rule. The theoretical analysis is thorough, and the empirical results are compelling, with a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. The committee believes that this paper will inspire a lot of people.
*This paper will be presented in the Oral Session 3 on Meta Learning and Adaptation on Apr 27 4pm GMT (9am PST).