Privacy policies provide individuals with information about their rights and how their personal information is handled. Natural language understanding (NLU) technologies can support individuals and practitioners to understand better privacy practices described in lengthy and complex documents. However, existing efforts that use NLU technologies are limited by processing the language in a way exclusive to a single task focusing on certain privacy practices. To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific pre-training offers performance improvements across all tasks. We release the benchmark to encourage future research in this domain.
翻译:隐私政策为个人提供有关其权利及其个人信息处理方式的信息; 自然语言理解技术(NLU)可以帮助个人和从业人员理解长而复杂的文件中描述的更好的隐私做法; 然而,目前使用非语言语言技术的努力受到限制,其方式是专门处理语言,只处理侧重于某些隐私做法的单一任务; 为此,我们引入了隐私政策语言理解评价基准(PLUE),这是一个多任务基准,用于评估隐私政策语言在各种任务中的理解程度; 我们还收集了大量隐私政策,使隐私政策与特定领域语言模式培训成为可能; 我们证明,特定领域的预先培训可以改善所有任务的业绩; 我们公布基准,鼓励今后在这一领域开展研究。