深度时间建模临床抑郁通过社交媒体文本 (Deep Temporal Modelling of Clinical Depression through Social Media Text)

We describe the development of a model to detect user-level clinical depression based on a user's temporal social media posts. Our model uses a Depression Symptoms Detection (DSD) classifier, which is trained on the largest existing samples of clinician annotated tweets for clinical depression symptoms. We subsequently use our DSD model to extract clinically relevant features, e.g., depression scores and their consequent temporal patterns, as well as user posting activity patterns, e.g., quantifying their ``no activity'' or ``silence.'' Furthermore, to evaluate the efficacy of these extracted features, we create three kinds of datasets including a test dataset, from two existing well-known benchmark datasets for user-level depression detection. We then provide accuracy measures based on single features, baseline features and feature ablation tests, at several different levels of temporal granularity. The relevant data distributions and clinical depression detection related settings can be exploited to draw a complete picture of the impact of different features across our created datasets. Finally, we show that, in general, only semantic oriented representation models perform well. However, clinical features may enhance overall performance provided that the training and testing distribution is similar, and there is more data in a user's timeline. The consequence is that the predictive capability of depression scores increase significantly while used in a more sensitive clinical depression detection settings.

翻译：我们描述了开发一种模型的过程，该模型可以基于用户在社交媒体上的时间性帖子来检测用户级别的临床抑郁。我们的模型使用了抑郁症状检测（DSD）分类器，该分类器是基于目前已知最大的临床医生注释的推文样本进行培训的，以检测临床抑郁症状。因此，我们使用我们的DSD模型提取具有临床相关特征，例如抑郁得分及其随后的时间模式，以及用户发布活动模式，例如量化他们的“无活动”或“沉默”因而出现的特征。此外，为了评估这些提取的特征的效果，我们创建了三种数据集，其中包括一个测试数据集，这些数据集是基于两个现有的有名用户级别抑郁检测基准数据集创建的。然后，在多个不同级别的时间粒度下，我们提供基于单个特征，基线特征和特征削减测试的准确性措施。相关的数据分布和临床抑郁检测相关设置可用于分析在我们创建的数据集中，不同特征对结果的影响。最后，我们显示，通常，只有语义定向的表示模型具有良好的性能。然而，在训练和测试分布相似且用户时间轴中有更多数据的情况下，临床特征可增强总体性能。其结果是，在更敏感的临床抑郁检测设置中使用抑郁分数的预测能力显著提高。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。