沙特隐私政策数据集 (The Saudi Privacy Policy Dataset)

This paper introduces the Saudi Privacy Policy Dataset, a diverse compilation of Arabic privacy policies from various sectors in Saudi Arabia, annotated according to the 10 principles of the Personal Data Protection Law (PDPL); the PDPL was established to be compatible with General Data Protection Regulation (GDPR); one of the most comprehensive data regulations worldwide. Data were collected from multiple sources, including the Saudi Central Bank, the Saudi Arabia National United Platform, the Council of Health Insurance, and general websites using Google and Wikipedia. The final dataset includes 1,000 websites belonging to 7 sectors, 4,638 lines of text, 775,370 tokens, and a corpus size of 8,353 KB. The annotated dataset offers significant reuse potential for assessing privacy policy compliance, benchmarking privacy practices across industries, and developing automated tools for monitoring adherence to data protection regulations. By providing a comprehensive and annotated dataset of privacy policies, this paper aims to facilitate further research and development in the areas of privacy policy analysis, natural language processing, and machine learning applications related to privacy and data protection, while also serving as an essential resource for researchers, policymakers, and industry professionals interested in understanding and promoting compliance with privacy regulations in Saudi Arabia.

翻译：本论文介绍了沙特隐私政策数据集，该数据集是根据个人数据保护法（PDPL）的10个原则进行注释的阿拉伯语隐私政策的多样化编译，PDPL是建立为与通用数据保护法规（GDPR）兼容的最全面的数据法规之一。数据收集自多个来源，包括沙特中央银行、沙特阿拉伯国家联合平台、医疗保险理事会和使用谷歌和维基百科的一般网站。最终数据集包括了7个行业的1,000个网站，共4,638行文本，775,370个标记和8,353 KB的语料库大小。通过提供一份综合注释的隐私政策数据集，本文旨在促进隐私政策分析、自然语言处理和与隐私和数据保护相关的机器学习应用的进一步研究和开发，同时也是研究人员、政策制定者和行业专业人士在了解和促进沙特阿拉伯隐私法规遵守方面的必要资源。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【开放书】隐私的现代社会技术视角，459页pdf，Modern Socio-Technical Perspectives on Privacy

专知会员服务

21+阅读 · 2022年3月24日