Social media classification tasks (e.g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous. Thus, training on tweets is challenging and demands large-scale human-annotated labels, which are time-consuming and costly to obtain. In this paper, we find that providing hashtags to social media tweets can help alleviate this issue because hashtags can enrich short and ambiguous tweets in terms of various information, such as topic, sentiment, and stance. This motivates us to propose a novel Hashtag-guided Tweet Classification model (HashTation), which automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification. To generate high-quality and insightful hashtags, our hashtag generation model retrieves and encodes the post-level and entity-level information across the whole corpus. Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks, in which only a limited amount of training data is provided, showing that automatically enriching tweets with model-generated hashtags could significantly reduce the demand for large-scale human-labeled data. Further analysis demonstrates that HashTation is able to generate high-quality hashtags that are consistent with the tweets and their labels. The code is available at https://github.com/shizhediao/HashTation.
翻译:社会媒体分类任务(例如,推特情绪分析、推特姿势检测)具有挑战性,因为社交媒体职位通常很短、非正式和含糊不清,因此,关于推特的培训具有挑战性,需要大规模的人文附加标签,这些标签耗时耗时且成本高昂。在本文中,我们发现,向社交媒体提供标签可以帮助缓解这一问题,因为标签可以丰富简短和模糊的推文,如主题、情绪和姿态等各种信息。这促使我们提出一个新的Hashtag-制导 Tweet 分类模式(HashTation),该模式自动为输入的推文生成有意义的标签标签,为推文分类提供有用的辅助信号。为了生成高品质和有见度的标签标签,我们的标签生成模式检索和编码了整个机构级和实体级的信息。实验显示,HashTation在7个低资源推文分类任务上取得了显著的改进,其中只提供了有限的培训数据,显示用模型生成的推文标签自动补充推文,可以大大减少对大比例的推文/高品质标签的需求。