This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not injecting positional encodings, multi-head attention is permutation invariant and we process randomly sampled sets of texts from a user after being encoded with a modern pretrained sentence encoder (RoBERTa / MiniLM). Moreover, our architecture is interpretable with modern feature attribution methods and allows for automatic dataset creation by identifying discriminating posts in a user's text-set. We perform ablation studies on hyper-parameters and evaluate our method for the eRisk 2022 Lab on early detection of signs of pathological gambling and early risk detection of depression. The method proposed by our team BLUE obtained the best ERDE5 score of 0.015, and the second-best ERDE50 score of 0.009 for pathological gambling detection. For the early detection of depression, we obtained the second-best ERDE50 of 0.027.
翻译:这项工作提议了一个用于对赌博成瘾和抑郁症进行用户分类的变压器结构,这种结构是经过培训的端到端。与在员额一级运作的其他方法不同,我们处理来自某个特定个人的一组社交媒体职位,以便利用职位之间的相互作用,消除职位一级的标签噪音;我们利用以下事实,即通过不注射定位编码,多头注意力是变异的,我们处理随机抽样的用户文本,在用现代预先培训的句子编码编码(ROBERTA/MINLM)编码之后,用户的文本被编码成一个现代预先培训的编码(ROBERTA/MINLM)。此外,我们的结构可以使用现代特征归属方法加以解释,通过在用户的文本集中识别歧视职位,允许自动创建数据集。我们进行了超光度研究,并评估了我们eRisk 2022实验室早期发现病态赌博迹象和早期检测抑郁症的方法。我们的团队BLUE建议的方法获得了0.05 最佳ERDE5分和获得的0.0MERDE-RMER50第二分,用于路径赌博的早期检测。