Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. This volatility, together with its decentralised nature, make Bitcoin highly subjective to speculative trading as compared to more traditional assets. In this paper, we propose a multimodal model for predicting extreme price fluctuations. This model takes as input a variety of correlated assets, technical indicators, as well as Twitter content. In an in-depth study, we explore whether social media discussions from the general public on Bitcoin have predictive power for extreme price movements. A dataset of 5,000 tweets per day containing the keyword `Bitcoin' was collected from 2015 to 2021. This dataset, called PreBit, is made available online. In our hybrid model, we use sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way. By combining these embeddings with a Convolutional Neural Network, we built a predictive model for significant market movements. The final multimodal ensemble model includes this NLP model together with a model based on candlestick data, technical indicators and correlated asset prices. In an ablation study, we explore the contribution of the individual modalities. Finally, we propose and backtest a trading strategy based on the predictions of our models with varying prediction threshold and show that it can used to build a profitable trading strategy with a reduced risk over a `hold' or moving average strategy.
翻译:Bitcoin自其源头以来,其受欢迎程度不断提高,显示出价格的极端波动性。这种波动性,连同其分散的性质,使Bitcoin高度主观性相对于传统资产而言,成为投机交易的投机性。在本文中,我们提出了一个预测极端价格波动的多式联运模式。这个模式以各种相关资产、技术指标和Twitter内容作为投入。在一项深入的研究中,我们探索公众关于Bitcoin的社交媒体讨论是否具有预测极端价格波动的预测力。在2015年至2021年期间,每天收集了包含关键词“Bitcoin”的5 000个推文数据集。这个称为PreBit的数据集在网上提供。在我们混合模型中,我们使用判决级FinBERT嵌入模型,事先经过金融词汇库的训练,以便捕捉这些推文的全部内容,并以易理解的方式将其提供给模型。通过这些嵌入一个动态神经网络,我们为重大市场流动建立了一个可变式的模型。最后的Mdddmodaldoal Convenble commble 模型包括这个称为PreBitt的数据集。我们使用的计算模型, 以及一个用于滚动的模型, 一种模型,用来展示一个基于我们的模型的模型, 以及一个模型,用来展示的滚动的模型,用来展示一个模型,用来展示一个基于我们的汇率的模型,用来展示一个模型,用来展示一个模型,用来展示一个模型,用来显示我们公司的模型。