通过自我监督学习获得金融时序数据无名标签 (Denoised Labels for Financial Time-Series Data via Self-Supervised Learning)

The introduction of electronic trading platforms effectively changed the organisation of traditional systemic trading from quote-driven markets into order-driven markets. Its convenience led to an exponentially increasing amount of financial data, which is however hard to use for the prediction of future prices, due to the low signal-to-noise ratio and the non-stationarity of financial time series. Simpler classification tasks -- where the goal is to predict the directions of future price movement -- via supervised learning algorithms, need sufficiently reliable labels to generalise well. Labelling financial data is however less well defined than other domains: did the price go up because of noise or because of signal? The existing labelling methods have limited countermeasures against noise and limited effects in improving learning algorithms. This work takes inspiration from image classification in trading and success in self-supervised learning. We investigate the idea of applying computer vision techniques to financial time-series to reduce the noise exposure and hence generate correct labels. We look at the label generation as the pretext task of a self-supervised learning approach and compare the naive (and noisy) labels, commonly used in the literature, with the labels generated by a denoising autoencoder for the same downstream classification task. Our results show that our denoised labels improve the performances of the downstream learning algorithm, for both small and large datasets. We further show that the signals we obtain can be used to effectively trade with binary strategies. We suggest that with proposed techniques, self-supervised learning constitutes a powerful framework for generating "better" financial labels that are useful for studying the underlying patterns of the market.

翻译：电子交易平台的引入有效地改变了传统系统性交易的组织,从报价驱动的市场到订单驱动的市场。其方便性导致金融数据数量急剧增加,但很难用于预测未来价格,因为信号到噪音的比例较低,金融时间序列不固定。更简单的分类任务 -- -- 目标是通过监督学习算法预测未来价格流动的方向 -- -- 需要足够可靠的标签来概括。标签金融数据比其他领域的定义要差:价格是否由于噪音或信号而上涨?现有的标签方法对噪音和在改进学习算法方面的有限效果反应有限。这项工作的灵感来自交易的图像分类和自我监督学习序列中的成功。我们研究的是,在财务时间序列中应用计算机愿景技术以减少噪音暴露,从而产生正确的标签。我们把标签生成看成一个自我监督学习方法的借口,并将文献中常用的天性(和噪音)标签加以比较?现有的标签方法在改进学习算法时,现有标签对噪音的影响有限,但从自我监督的分类中汲取了灵感。我们使用的下游数据序列显示的是,我们用来进行学习的下游数据序列。