Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; additionally we provide an easy, representation theoretic justification for the construction of randomized signatures. Our first application is based on synthetic data and aims at distinguishing between real and fake trajectories of stock prices, which are indistinguishable by visual inspection. We also show a real life application by using transaction data from the cryptocurrency market. In this case, we are able to identify pump and dump attempts organized on social networks with F1 scores up to 88% by means of our unsupervised learning algorithm, thus achieving results that are close to the state-of-the-art in the field based on supervised learning.
翻译:异常检测是查明数据组中明显偏离常规的异常情况或事件的过程。 在这项研究中,我们提出基于签名的机器学习算法,以探测特定时间序列类型数据组中的稀有或意外项目。我们展示了签名或随机签名的应用作为异常检测算法的特征提取器;此外,我们为随机签名的构建提供了一个简单、代表的理论理由。我们的第一个应用基于合成数据,目的是区分股票价格的真实和假轨迹,通过视觉检查是无法区分的。我们还通过使用加密货币市场的交易数据展示了真实的生命应用。在此情况下,我们能够通过我们不受监督的学习算法,查明在F1分到88%的社会网络上组织的泵和倾弃尝试,从而取得接近于以监督学习为基础的该领域最新技术的成果。