Parsing spoken dialogue poses unique difficulties, including disfluencies and unmarked boundaries between sentence-like units. Previous work has shown that prosody can help with parsing disfluent speech (Tran et al. 2018), but has assumed that the input to the parser is already segmented into sentence-like units (SUs), which isn't true in existing speech applications. We investigate how prosody affects a parser that receives an entire dialogue turn as input (a turn-based model), instead of gold standard pre-segmented SUs (an SU-based model). In experiments on the English Switchboard corpus, we find that when using transcripts alone, the turn-based model has trouble segmenting SUs, leading to worse parse performance than the SU-based model. However, prosody can effectively replace gold standard SU boundaries: with prosody, the turn-based model performs as well as the SU-based model (90.79 vs. 90.65 F1 score, respectively), despite performing two tasks (SU segmentation and parsing) rather than one (parsing alone). Analysis shows that pitch and intensity features are the most important for this corpus, since they allow the model to correctly distinguish an SU boundary from a speech disfluency -- a distinction that the model otherwise struggles to make.
翻译:解析口述对话带来了独特的困难, 包括混乱和类似句式单元之间没有标记的界限。 先前的工作已经表明, 假肢可以帮助解析排泄性言论( Tran 等人, 2018年), 但假设对读取器的输入已经分割成类似句式的单元( SUs), 而在现有的语音应用程序中, 情况并非如此。 我们调查了作曲会如何影响作为输入( 以转折为基础的模型) 接受整个对话转折( 以转折为基础的模型) 的剖析器, 而不是金质标准前的SUs( 以 SU 为基础的模型) 。 在英国交换机堆的实验中, 我们发现, 仅使用笔录, 转基因模型会给解析 Sups 带来麻烦, 导致比基于 SUs 的模型更差的性能。 然而, 作曲型模型可以有效地取代金质标准SU的界限: 以 Prosordy、 turn- basy 和 suble 模式( 90. 90. 65 F1 评分) ), 尽管执行两项任务( Suplement and parding) exparding) 而不是一个模型。