In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.
翻译:暂无翻译