Bipolar disorder is a mental health disorder that causes mood swings that range from depression to mania. Diagnosis of bipolar disorder is usually done based on patient interviews, and reports obtained from the caregivers of the patients. Subsequently, the diagnosis depends on the experience of the expert, and it is possible to have confusions of the disorder with other mental disorders. Automated processes in the diagnosis of bipolar disorder can help providing quantitative indicators, and allow easier observations of the patients for longer periods. Furthermore, the need for remote treatment and diagnosis became especially important during the COVID-19 pandemic. In this thesis, we create a multimodal decision system based on recordings of the patient in acoustic, linguistic, and visual modalities. The system is trained on the Bipolar Disorder corpus. Comprehensive analysis of unimodal and multimodal systems, as well as various fusion techniques are performed. Besides processing entire patient sessions using unimodal features, a task-level investigation of the clips is studied. Using acoustic, linguistic, and visual features in a multimodal fusion system, we achieved a 64.8% unweighted average recall score, which improves the state-of-the-art performance achieved on this dataset.
翻译:双极障碍是一种精神疾病,引起情绪波动,从抑郁到狂躁不等。双极障碍的诊断通常根据病人的访谈和病人护理者的报告进行。随后,诊断取决于专家的经验,并有可能将疾病与其他精神紊乱混为一体。两极障碍诊断的自动化过程有助于提供定量指标,并使病人能够更方便地观察更长的时间。此外,在COVID-19大流行期间,远程治疗和诊断的需要变得特别重要。在这个理论中,我们根据病人的声学、语言和视觉模式记录,建立了一个多式决定系统。该系统在双极障碍综合体上接受培训。对单式和多式系统以及各种融合技术进行了全面分析。除了用单式特征处理整个病人会议外,还要研究对剪片进行的任务调查。在多式融合系统中,使用声学、语言和视觉特征,我们取得了64.8%的未加权平均记分,从而改进了这一数据的状态性能。