Online social networks offer vast opportunities for computational social science, but effective user embedding is crucial for downstream tasks. Traditionally, researchers have used pre-defined network-based user features, such as degree, and centrality measures, and/or content-based features, such as posts and reposts. However, these measures may not capture the complex characteristics of social media users. In this study, we propose a user embedding method based on the URL domain co-occurrence network, which is simple but effective for representing social media users in competing events. We assessed the performance of this method in binary classification tasks using benchmark datasets that included Twitter users related to COVID-19 infodemic topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings generated directly from the retweet network, and those based on language, performed below expectations. In contrast, our domain-based embeddings outperformed these methods while reducing computation time. These findings suggest that the domain-based user embedding can serve as an effective tool to characterize social media users participating in competing events, such as political campaigns and public health crises.
翻译:暂无翻译