BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciprocal rank (MRR) scores of 10-14%. As a potential method to reduce this disparity, we investigate the use of semi-supervised ASR transcripts and N-best ASR transcripts to mitigate ASR errors for spoken search using BERT-based ranking. Semi-supervised ASR transcripts brought 2-5.5% MRR improvements over standard ASR transcripts and our N-best early fusion methods for BERT DR systems improved MRR by 3-4%. Combining semi-supervised transcripts with N-best early fusion for BERT DR reduced the MRR gap in search effectiveness between manual and ASR transcripts by more than 50% from 14.32% to 6.58%.
翻译:事实证明,基于BERT的重新排序和密集检索(DR)系统提高了对口语内容检索(SCR)的搜索效力。但是,与准确的人工记录誊本相比,这两种方法在使用ASR记录誊本时仍然可以显示效力下降。我们发现,关于口语教学录像的 " How2 " 数据集的已知项目搜索任务显示,平均对等排名分数减少了10-4%。作为缩小这一差距的潜在方法,我们调查使用半监督的ASR记录誊本和最佳ASR记录誊本的情况,以通过BERT的排名减少口语搜索的ASR错误。 半监督的ASR记录誊本比标准的ASR记录誊本和我们的BERT DR系统N最佳早期融合方法提高了2.4%至4%。将半监督记录誊本与BERT DR的N最佳早期融合相结合,将人工和ASR记录誊本的搜索效率差距缩小50%以上,从14.32%降至6.58%。