Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the weaknesses of non-autoregressive baselines and establishes a reliable criterion that suits for evaluating phoneme boundary detection.
翻译:对电话边界探测进行了研究,因为电话边界探测在各种语音应用中起着中心作用。 在这项工作中,我们指出,这项任务不仅需要通过算法方法,而且还需要通过评价衡量标准来完成。为此目的,我们首先提议一个以自动递减方式运作的最先进的电话边界探测器,称为SuperSeg。关于TIMT和Buckeye Corbora的实验表明,SUperSeg与现有模型相比,确定电话边界的幅度很大。此外,我们注意到,流行评价指标R价值有局限性,并提出了新的评价指标,防止每个边界对评价作出多次的贡献。拟议的衡量标准揭示了非自动递增基线的弱点,并确定了评价电话边界探测的可靠标准。