双重组合:金融数据分析基于抽样加权和特征选择的新组合方法 (DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis)

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns. However, since financial datasets have very low signal-to-noise ratio and are non-stationary, complex models are often very prone to overfitting and suffer from instability issues. Moreover, as various machine learning and data mining tools become more widely used in quantitative trading, many trading firms have been producing an increasing number of features (aka factors). Therefore, how to automatically select effective features becomes an imminent problem. To address these issues, we propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection. Specifically, we identify the key samples based on the training dynamics on each sample and elicit key features based on the ablation impact of each feature via shuffling. Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction. We conduct extensive experiments, including price prediction for cryptocurrencies and stock trading, using both DNN and gradient boosting decision tree as base models. Our experiment results demonstrate that DoubleEnsemble achieves a superior performance compared with several baseline methods.

翻译：现代机器学习模型(如深神经网络和提升决策树模型)在金融市场预测中越来越受欢迎,这是因为它们具有提取复杂非线性模式的优越能力。然而,由于金融数据集的信号到噪音比率非常低,而且不是静止的,因此复杂的模型往往非常容易过度适应,并受到不稳定问题的影响。此外,随着各种机器学习和数据挖掘工具在数量贸易中日益广泛使用,许多贸易公司产生了越来越多的特征(卡因素),因此,如何自动选择有效特征已成为一个迫在眉睫的问题。为了解决这些问题,我们提议采用“双倍组合”这一共同框架,利用学习轨迹,根据抽样的抽样加权和抖动选择特征。具体地说,我们根据每个样本的培训动态确定关键样本,并根据每个特征的相互影响得出关键特征特征。我们的模式适用于广泛的基础模型,能够提取复杂的模式,同时减轻金融市场预测中的过度匹配和不稳定问题。我们进行了广泛的实验,包括利用精确的模型预测价格轨迹,利用升级的树级模型和股票交易结果,同时使用双向梯度测试。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日