We present the first openly available corpus for detecting depression in Thai. Our corpus is compiled by expert verified cases of depression in several online blogs. We experiment with two different LSTM based models and two different BERT based models. We achieve a 77.53\% accuracy with a Thai BERT model in detecting depression. This establishes a good baseline for future researcher on the same corpus. Furthermore, we identify a need for Thai embeddings that have been trained on a more varied corpus than Wikipedia. Our corpus, code and trained models have been released openly on Zenodo.
翻译:我们展示了泰国第一个可公开获取的抑郁症检测系统。我们的数据是由专家在几个在线博客上核实的抑郁症案例汇编而成。我们实验了两种不同的基于LSTM的模型和两种基于BERT的模型。我们在检测抑郁症时实现了77.53 ⁇ 的准确性,用泰国BERT的模型实现了77.53 ⁇ 的准确性。这为未来的研究者就同一物质建立了良好的基准。此外,我们确定泰国需要嵌入比维基百科更多样化的系统。我们的系统、代码和经过培训的模型已经在泽诺多公开发布。