With the evolution of the concept of Speaker diarization using LSTM, it is relatively easier to understand the speaker identities for specific segments of input audio stream data than manually tagging the data. With such a concept, it is highly desirable to consider the possibility of using the identified speaker identities to aid in recognizing the Speaker States in a conversation. In this study, the Markov Chains are used to identify and update the Speaker States for the next conversations between the same set of speakers, to enable identification of their states in the most natural and long conversations. The model is based on several audio samples from natural conversations of three or greater than three speakers in two datasets with overall total error percentages for recognized states being lesser than or equal to 12%. The findings imply that the proposed extension to the Speaker diarization is effective to predict the states for a conversation.
翻译:随着使用LSTM的议长分化概念的演进,理解输入的音频流数据特定部分的发言者身份比人工标记数据要容易得多。有了这样一个概念,非常可取的是考虑使用已确认的发言者身份的可能性,以帮助在谈话中承认各发言国。在这项研究中,利用Markov 链子确定并更新同一组发言者之间下一次对话的议长国,以便能够在最自然和最长时间的谈话中确定各自国家的状态。该模型基于两个数据集中三个或三个以上发言者的自然谈话的若干音频样本,这两个数据集的总误差百分比低于或等于12%。研究结果表明,提议扩大议长分化的范围,可以有效地预测各州的对话情况。