The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are the best-suited to analyze such discrete representations. We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We compare the results they show when applied to two different models, while systematically studying the effect of the placement and size of the discretization layer. We find that different evaluation regimes can give inconsistent results. While we can attribute them to the properties of the different metrics in most cases, one point of concern remains: the use of minimal pairs of phoneme triples as stimuli disadvantages larger discrete unit inventories, unlike metrics applied to complete utterances. Furthermore, while in general vector quantization induces representations that correlate with units posited in linguistics, the strength of this correlation is only moderate.
翻译:神经网络使用的分布式和连续的表达方式与语言中通常具有象征意义的分布式和连续的表达方式不相符。提出了矢量量化,作为诱导离散神经表达方式的一种方法,这些表达方式在性质上与其语言对应方更为接近。然而,尚不清楚哪些度量最适合于分析这种离散的表达方式。我们比较了口语模式监督薄弱情况下四种常用度量的优点。我们比较了它们应用到两种不同模型时显示的结果,同时系统地研究离散层位置和大小的影响。我们发现不同的评价制度可以产生不一致的结果。虽然我们可以将它们归因于不同度量的特性,但一个令人关切的问题仍然是:使用最小的双电话三联体作为较大型离散单位清单的刺激性缺点,不同于用于完整语调的度量值。此外,在一般情况下,矢量定量引出与语言显示的单位相关的表达方式,但这种关联的强度只是适度的。