This study looks for signals of economic awareness on online social media and tests their significance in economic predictions. The study analyses, over a period of two years, the relationship between the West Texas Intermediate daily crude oil price and multiple predictors extracted from Twitter, Google Trends, Wikipedia, and the Global Data on Events, Language, and Tone database (GDELT). Semantic analysis is applied to study the sentiment, emotionality and complexity of the language used. Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) models are used to make predictions and to confirm the value of the study variables. Results show that the combined analysis of the four media platforms carries valuable information in making financial forecasting. Twitter language complexity, GDELT number of articles and Wikipedia page reads have the highest predictive power. This study also allows a comparison of the different fore-sighting abilities of each platform, in terms of how many days ahead a platform can predict a price movement before it happens. In comparison with previous work, more media sources and more dimensions of the interaction and of the language used are combined in a joint analysis.
翻译:这项研究在两年内分析了西得克萨斯州中级日原油价格与从Twitter、谷歌趋势、维基百科和关于事件、语言和全球数据数据库(GDELT)中提取的多个预测器之间的关系,运用了语义分析来研究所用语言的情绪、情感和复杂性。利用自动递减综合移动平均值与解释变量模型(ARIMAX)进行预测和确认研究变量的价值。研究结果显示,四个媒体平台的综合分析在金融预测方面提供了宝贵的信息。Twitter语言的复杂性、GDELT文章数量和维基百科网页的预测力最高。这项研究还比较了每个平台不同的视觉能力,即一个平台在预测价格变化之前的几天内能够预测价格变化。与以前的工作相比,在联合分析中将更多的媒体来源和互动及所用语言的更多层面结合起来。