In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.
翻译:在对问题和统计估计的优化处理方法中,通常会通过常规化来增加目标,以应对与不规则化相关的挑战。 选择合适的常规化通常受先前域信息和计算因素的驱动。 康韦克斯正规化者具有吸引力, 因为他们拥有最佳化证书以及康韦克斯分析工具包, 但显示一个计算尺度, 使其不适宜于中度问题实例。 另一方面, 非康韦克斯的常规化者往往可以在规模上部署, 但不享受与康韦克斯正规化者相关的认证属性。 在本文中, 我们通过调查以下问题, 来系统了解康维克斯正规化的力量和局限性的局限性: 有了分布, 最优化的规范化者是什么, 以及从康维克斯分析器分析工具, 数据源的精度和正统化者是什么属性。 我们的正统化者与非康维克斯正规化者分别对应的认证属性。 我们的正统化者, 正在通过正统化的正统化机构 和恒定性数据序列中, 显示我们的正统化数据源的正统化者 。