There has been growing interest in applying NLP techniques in the financial domain, however, resources are extremely limited. This paper introduces StockEmotions, a new dataset for detecting emotions in the stock market that consists of 10,000 English comments collected from StockTwits, a financial social media platform. Inspired by behavioral finance, it proposes 12 fine-grained emotion classes that span the roller coaster of investor emotion. Unlike existing financial sentiment datasets, StockEmotions presents granular features such as investor sentiment classes, fine-grained emotions, emojis, and time series data. To demonstrate the usability of the dataset, we perform a dataset analysis and conduct experimental downstream tasks. For financial sentiment/emotion classification tasks, DistilBERT outperforms other baselines, and for multivariate time series forecasting, a Temporal Attention LSTM model combining price index, text, and emotion features achieves the best performance than using a single feature.
翻译:然而,对在金融领域应用NLP技术的兴趣日益浓厚,但资源却极为有限。本文介绍了SockEmotions,这是用来检测股市情绪的新数据集,由从金融社交媒体平台StockTwits收集的10,000份英文评论组成。在行为金融的启发下,它提出了12个细微的情感类别,跨过投资者情感的云云层。与现有的金融情绪数据集不同,SockEmotions呈现了微粒特征,如投资者情绪阶级、细微感应、情感、情感和时间序列数据。为了展示数据集的可用性,我们进行了数据集分析,并开展了实验性的下游任务。对于金融情绪/情绪分类任务,DistillBERT超越了其他基线,对于多变时间序列预测,一个将物价指数、文字和情感特征结合在一起的时空注意LSTM模型比使用单一特征取得最佳性能。