Clustering is a ubiquitous problem in data science and signal processing. In many applications where we observe noisy signals, it is common practice to first denoise the data, perhaps using wavelet denoising, and then to apply a clustering algorithm. In this paper, we develop a sparse convex wavelet clustering approach that simultaneously denoises and discovers groups. Our approach utilizes convex fusion penalties to achieve agglomeration and group-sparse penalties to denoise through sparsity in the wavelet domain. In contrast to common practice which denoises then clusters, our method is a unified, convex approach that performs both simultaneously. Our method yields denoised (wavelet-sparse) cluster centroids that both improve interpretability and data compression. We demonstrate our method on synthetic examples and in an application to NMR spectroscopy.
翻译:在数据科学和信号处理过程中,集群是一个普遍存在的问题。在许多我们观测噪音信号的应用程序中,通常的做法是首先将数据密封起来,或许使用波浪去掉,然后采用群集算法。在本文中,我们开发了一种稀疏的细微细细细流波子组合法,既能同时发现绿叶,又能同时发现群落。我们的方法是使用混凝土集合惩罚,通过波浪域的松散来达到凝聚和群状惩罚。与当时的蜂巢聚在一起的常见做法相反,我们的方法是一种同时同时运行的统一的康韦克斯方法。我们的方法产生脱色(波浪-吸缩)的集状圆球体,既能改进解释性和数据压缩性。我们的方法是用合成例子和对NMR光谱学的应用来展示我们的方法。