The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are the best-suited to analyze such discrete representations. We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We perform a systematic analysis of the impact of (i) architectural choices, (ii) the learning objective and training dataset, and (iii) the evaluation metric. We find that the different evaluation metrics can give inconsistent results. In particular, we find that the use of minimal pairs of phoneme triples as stimuli during evaluation disadvantages larger embeddings, unlike metrics applied to complete utterances.
翻译:神经网络使用的分布式和连续的表述方式与语言中使用的典型的象征性表述方式不相符合。矢量量化方式被提议作为一种方法,引导离散的神经表达方式,在性质上与其语言对应方更为接近。然而,尚不清楚哪一种衡量尺度最适合分析这种离散的表述方式。我们比较了口语模式监督薄弱情况下四种常用衡量尺度的优点。我们系统分析了(一) 建筑选择,(二) 学习目标和培训数据集,以及(三) 评估指标的影响。我们发现,不同的评价指标可以产生不一致的结果。特别是,我们发现,在评价不力的较大嵌入模式中,使用最小配对电话环的配对作为刺激力,这不同于用于完整发音的衡量尺度。