In this chapter, we discuss recent work on learning sparse approximations to high-dimensional functions on data, where the target functions may be scalar-, vector- or even Hilbert space-valued. Our main objective is to study how the sampling strategy affects the sample complexity -- that is, the number of samples that suffice for accurate and stable recovery -- and to use this insight to obtain optimal or near-optimal sampling procedures. We consider two settings. First, when a target sparse representation is known, in which case we present a near-complete answer based on drawing independent random samples from carefully-designed probability measures. Second, we consider the more challenging scenario when such representation is unknown. In this case, while not giving a full answer, we describe a general construction of sampling measures that improves over standard Monte Carlo sampling. We present examples using algebraic and trigonometric polynomials, and for the former, we also introduce a new procedure for function approximation on irregular (i.e., nontensorial) domains. The effectiveness of this procedure is shown through numerical examples. Finally, we discuss a number of structured sparsity models, and how they may lead to better approximations.
翻译:在本章中,我们讨论最近关于学习数据高维功能的少见近似值的工作,目标功能可能是卡路里、矢量或甚至希尔伯特空间估价。我们的主要目标是研究抽样战略如何影响抽样复杂性 -- -- 即足以准确和稳定恢复的样品数量 -- -- 并利用这种洞察力获得最佳或近于最佳的取样程序。我们考虑两种设置。首先,当目标偏差代表已知时,我们根据仔细设计的概率测量方法绘制独立的随机样本,给出了近乎完整的答案。第二,我们考虑了在这种代表方法未知的情况下更具挑战性的设想。在此情况下,我们描述抽样战略如何影响抽样复杂性 -- -- 即足以准确和稳定恢复的样品数量 -- -- 并利用这种洞察力来获得最佳或近于最佳的取样程序。我们还考虑了两个设置。首先,当目标偏差代表方法已知时,我们提出了一种基于从精心设计的概率测量方法中提取独立随机样本的近似全的答案。第二,我们考虑了这种方法的有效性如何通过数字示例来显示。最后,我们讨论了一些结构偏差模型,以及它们如何导致更精确。