In this paper we take into account both social and linguistic aspects to perform demographic analysis by processing a large amount of tweets in Basque language. The study of demographic characteristics and social relationships are approached by applying machine learning and modern deep-learning Natural Language Processing (NLP) techniques, combining social sciences with automatic text processing. More specifically, our main objective is to combine demographic inference and social analysis in order to detect young Basque Twitter users and to identify the communities that arise from their relationships or shared content. This social and demographic analysis will be entirely based on the~automatically collected tweets using NLP to convert unstructured textual information into interpretable knowledge.
翻译:本文通过处理大量巴斯克语的推文进行人口分析,既考虑到社会因素,也考虑到语言因素,以进行人口分析;通过应用机器学习和现代深学习自然语言处理技术,将社会科学与自动文本处理相结合,研究人口特征和社会关系;更具体地说,我们的主要目标是将人口推断和社会分析结合起来,以发现年轻的巴斯克推特用户,并查明他们的关系或共同内容所产生的社区;这一社会和人口分析将完全基于利用NLP自动收集的推文,将非结构文字信息转化为可解释的知识。