Clinical NLP tasks such as mental health assessment from text, must take social constraints into account - the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limit user data to 'what is necessary' for a given purpose. In this work, we reason that providing stricter formal privacy guarantees, while increasing the volume of user data in the model, in most cases increases benefit for all parties involved, especially for the user. We demonstrate our arguments on two existing suicide risk assessment datasets of Twitter and Reddit posts. We present the first analysis juxtaposing user history length and differential privacy budgets and elaborate how modeling additional user context enables utility preservation while maintaining acceptable user privacy guarantees.
翻译:从文本中进行心理健康评估等临床国家实验室方案任务必须考虑到社会制约因素----保证用户数据隐私的极端重要性必须限制业绩最大化。 诸如GDPR等消费者保护条例通常通过限制数据可用性来处理隐私问题,例如要求将用户数据限制在特定目的的“必要”范围。 在这项工作中,我们认为,在增加模型中的用户数据数量的同时,提供更严格的正式隐私保障,在大多数情况下会增加所涉各方,特别是用户的权益。 我们展示了我们关于Twitter和Reddit这两个现有自杀风险评估数据集的论点。我们提出了第一份分析,对用户历史长度和差异隐私预算进行了对比,并阐述了如何在保持可接受的用户隐私保障的同时,为额外的用户环境提供更加严格的正式的隐私保障。