This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS). The fundamental idea of DDS is to exploit a class of powerful deep invertible density estimators called normalizing flows, to model the dictionary in a linear decomposition method such as NMF, effectively creating a bijection between the space of dictionary elements and the associated probability space, allowing a differentiable search through the dictionary space, guided by the estimated densities. As the initial formulation was a proof of concept with some practical limitations, we will present several steps towards making it scalable, hoping to improve both the computational complexity of the method and its signal decomposition capabilities. As a testbed for experimental evaluation, we choose the task of frame-level piano transcription, where the signal is to be decomposed into sources whose activity is attributed to individual piano notes. To highlight the impact of improved non-linear modelling of sources, we compare variants of our method to a linear overcomplete NMF baseline. Experimental results will show that even in the absence of additional constraints, our models produce increasingly sparse and precise decompositions, according to two pertinent evaluation measures.
翻译:本文描述了我们最近以不同词典搜索(DDS)的名义制定的信号分解新方法的若干改进。DDS的基本设想是,利用被称为正常流流的一组强大的深、可垂直密度估计器,将字典模拟成线性分解法,例如NMF,有效地在字典元素空间和相关的概率空间之间产生一个分母,从而可以在估计密度的指导下通过词典空间进行不同搜索。由于最初的提法是概念的证明,并具有一些实际的局限性,我们将提出若干步骤,使概念可以伸缩,希望改进方法的计算复杂性及其信号分解能力。作为实验性评估的试床,我们选择框架级钢琴笔记的任务,将信号分解成由单个钢琴笔记进行活动的来源。为了突出改进非线性源建模的影响,我们将我们方法的变异体与线性过量的NMF基准进行比较。实验结果将表明,即使没有额外的限制,我们模型也会产生越来越细和精确的评估。