仔细审查运往深处非法木材贸易的运输记录 (Scrutinizing Shipment Records To Thwart Illegal Timber Trade)

from arxiv, Accepted in Proceedings of 6th Outlier Detection and Description Workshop, ACM SigKDD 2021 https://oddworkshop.github.io/assets/papers/7.pdf. arXiv admin note: substantial text overlap with arXiv:2104.01156

Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain and have been tied to illicit financial flows, like trade-based money laundering, document fraud, species mislabeling, and other illegal activities. The task of finding such fraudulent activities using trade data, in the absence of ground truth, can be modelled as an unsupervised anomaly detection problem. However existing approaches suffer from certain shortcomings in their applicability towards large scale trade data. Trade data is heterogeneous, with both categorical and numerical attributes in a tabular format. The overall challenge lies in the complexity, volume and velocity of data, with large number of entities and lack of ground truth labels. To mitigate these, we propose a novel unsupervised anomaly detection -- Contrastive Learning based Heterogeneous Anomaly Detection (CHAD) that is generally applicable for large-scale heterogeneous tabular data. We demonstrate our model CHAD performs favorably against multiple comparable baselines for public benchmark datasets, and outperforms them in the case of trade data. More importantly we demonstrate our approach reduces assumptions and efforts required hyperparameter tuning, which is a key challenging aspect in an unsupervised training paradigm. Specifically, our overarching objective pertains to detecting suspicious timber shipments and patterns using Bill of Lading trade record data. Detecting anomalous transactions in shipment records can enable further investigation by government agencies and supply chain constituents.

翻译：木材和森林产品部门这些灰色和黑市活动不仅局限于伐木地国,而且遍及全球供应链,并且与非法资金流动有关,如贸易洗钱、文件欺诈、物种标签错误和其他非法活动。在缺乏地面真相的情况下,利用贸易数据寻找此类欺诈活动的任务可以仿照一个不受监督的异常现象检测问题。但现有办法却因在大规模贸易数据的适用性方面存在某些缺陷而受到影响。贸易数据多种多样,既有直截了当的特征,也有数字特征的表格格式。总体挑战在于数据的复杂性、数量和速度,以及大量实体和缺乏地面真相标签等,为了减轻这些困难,我们建议采用新的、不超强的异常检测方法 -- -- 以不透明学习为基础的异常现象异常现象检测(CHAD),通常适用于大规模混杂的表格数据。我们要用模型CHAD的准确性和更精确的货币交易记录来证明我们的标准性地衡量标准。我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准贸易的,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,比比比比比比比比我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,比比比比比比比比比比标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日