While there is increasing global attention to data privacy, most of their current theoretical understanding is based on research conducted in a few countries. Prior work argues that people's cultural backgrounds might shape their privacy concerns; thus, we could expect people from different world regions to conceptualize them in diverse ways. We collected and analyzed a large-scale dataset of tweets about the #CambridgeAnalytica scandal in Spanish and English to start exploring this hypothesis. We employed word embeddings and qualitative analysis to identify which information privacy concerns are present and characterize language and regional differences in emphasis on these concerns. Our results suggest that related concepts, such as regulations, can be added to current information privacy frameworks. We also observe a greater emphasis on data collection in English than in Spanish. Additionally, data from North America exhibits a narrower focus on awareness compared to other regions under study. Our results call for more diverse sources of data and nuanced analysis of data privacy concerns around the globe.
翻译:虽然全球越来越关注数据隐私问题,但他们目前的理论理解大多以在几个国家进行的研究为基础。先前的工作认为,人们的文化背景可能决定他们的隐私问题;因此,我们可以期望世界不同地区的人以不同的方式对这些数据进行概念化。我们收集和分析了关于西班牙文和英文的#CambridgeAlytica丑闻的大规模推特数据集,以开始探讨这一假设。我们使用嵌入文字和定性分析来确定存在哪些信息隐私问题,并描述强调这些问题的语言和区域差异。我们的结果表明,相关的概念,如条例,可以添加到当前的信息隐私框架之中。我们还注意到,比西班牙语更多地强调用英语收集数据。此外,北美的数据与正在研究的其他区域相比,对认识的侧重范围较窄。我们的结果要求更多样化的数据来源和对全球数据隐私问题进行细微分析。