Data mining is a good way to find the relationship between raw data and predict the target we want which is also widely used in different field nowadays. In this project, we implement a lots of technology and method in data mining to predict the sale of an item based on its previous sale. We create a strong model to predict the sales. After evaluating this model, we Online social networks offer vast opportunities for computational social science, but effective user embedding is crucial for downstream tasks. Traditionally, researchers have used pre-defined network-based user features, such as degree, and centrality measures, and/or content-based features, such as posts and reposts. However, these measures may not capture the complex characteristics of social media users. In this study, we propose a user embedding method based on the URL domain co-occurrence network, which is simple but effective for representing social media users in competing events. We assessed the performance of this method in binary classification tasks using benchmark datasets that included Twitter users related to COVID-19 infodemic topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings generated directly from the retweet network, and those based on language, performed below expectations. In contrast, our domain-based embeddings outperformed these methods while reducing computation time. These findings suggest that the domain-based user embedding can serve as an effective tool to characterize social media users participating in competing events, such as political campaigns and public health crises.
翻译:暂无翻译