Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain and have been tied to illicit financial flows, like trade-based money laundering, document fraud, species mislabeling, and other illegal activities. The task of finding such fraudulent activities using trade data, in the absence of ground truth, can be modelled as an unsupervised anomaly detection problem. However existing approaches suffer from certain shortcomings in their applicability towards large scale trade data. Trade data is heterogeneous, with both categorical and numerical attributes in a tabular format. The overall challenge lies in the complexity, volume and velocity of data, with large number of entities and lack of ground truth labels. To mitigate these, we propose a novel unsupervised anomaly detection -- Contrastive Learning based Heterogeneous Anomaly Detection (CHAD) that is generally applicable for large-scale heterogeneous tabular data. We demonstrate our model CHAD performs favorably against multiple comparable baselines for public benchmark datasets, and outperforms them in the case of trade data. More importantly we demonstrate our approach reduces assumptions and efforts required hyperparameter tuning, which is a key challenging aspect in an unsupervised training paradigm. Specifically, our overarching objective pertains to detecting suspicious timber shipments and patterns using Bill of Lading trade record data. Detecting anomalous transactions in shipment records can enable further investigation by government agencies and supply chain constituents.
翻译:木材和森林产品部门这些灰色和黑市活动不仅局限于伐木地国,而且遍及全球供应链,并且与非法资金流动有关,如贸易洗钱、文件欺诈、物种标签错误和其他非法活动。在缺乏地面真相的情况下,利用贸易数据寻找此类欺诈活动的任务可以仿照一个不受监督的异常现象检测问题。但现有办法却因在大规模贸易数据的适用性方面存在某些缺陷而受到影响。贸易数据多种多样,既有直截了当的特征,也有数字特征的表格格式。总体挑战在于数据的复杂性、数量和速度,以及大量实体和缺乏地面真相标签等,为了减轻这些困难,我们建议采用新的、不超强的异常检测方法 -- -- 以不透明学习为基础的异常现象异常现象检测(CHAD),通常适用于大规模混杂的表格数据。我们要用模型CHAD的准确性和更精确的货币交易记录来证明我们的标准性地衡量标准。我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准贸易的,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,比比比比比比比比我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准是,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,比比比比比比比比比比标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,我们的标准是,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,我们的标准,