Detecting anomalies has been a fundamental approach in detecting potentially fraudulent activities. Tasked with detection of illegal timber trade that threatens ecosystems and economies and association with other illegal activities, we formulate our problem as one of anomaly detection. Among other challenges annotations are unavailable for our large-scale trade data with heterogeneous features (categorical and continuous), that can assist in building automated systems to detect fraudulent transactions. Modelling the task as unsupervised anomaly detection, we propose a novel model Contrastive Learning based Heterogeneous Anomaly Detector to address shortcomings of prior models. Our model uses an asymmetric autoencoder that can effectively handle large arity categorical variables, but avoids assumptions about structure of data in low-dimensional latent space and is robust to changes to hyper-parameters. The likelihood of data is approximated through an estimator network, which is jointly trained with the autoencoder,using negative sampling. Further the details and intuition for an effective negative sample generation approach for heterogeneous data are outlined. We provide a qualitative study to showcase the effectiveness of our model in detecting anomalies in timber trade.
翻译:检测异常现象是发现潜在欺诈活动的基本方法。我们的任务是发现威胁生态系统和经济并与其他非法活动有关联的非法木材贸易,我们把问题作为异常现象的检测问题。我们没有其它挑战说明,因为我们大型贸易数据具有多种特征(分类和连续),有助于建立自动系统以侦查欺诈交易。将这项任务建为未受监督的异常现象检测,我们提议建立一个新型的模型,以基于不同基因的异常现象检测器为基础,解决先前模型的缺陷。我们模型使用不对称自动编码器,可以有效处理大量绝对变量,但避免对低维潜层空间数据结构的假设,并且能够对超光度参数的变化产生有力影响。数据的可能性通过一个估算器网络进行估计,该网络与自动编码器共同培训,使用负面取样器进行模拟。我们提供了一份定性研究,以展示我们模型在检测木材贸易异常现象方面的有效性。