Conversational systems rely heavily on speech recognition to interpret and respond to user commands and queries. Nevertheless, recognition errors may occur, which can significantly affect the performance of such systems. While visual feedback can help detect errors, it may not always be practical, especially for people who are blind or low-vision. In this study, we investigate ways to improve error detection by manipulating the audio output of the transcribed text based on the recognizer's confidence level in its result. Our findings show that selectively slowing down the audio when the recognizer exhibited uncertainty led to a relative increase of 12% in participants' error detection ability compared to uniformly slowing down the audio.
翻译:暂无翻译