Social media such as Instagram and Twitter have become important platforms for marketing and selling illicit drugs. Detection of online illicit drug trafficking has become critical to combat the online trade of illicit drugs. However, the legal status often varies spatially and temporally; even for the same drug, federal and state legislation can have different regulations about its legality. Meanwhile, more drug trafficking events are disguised as a novel form of advertising commenting leading to information heterogeneity. Accordingly, accurate detection of illicit drug trafficking events (IDTEs) from social media has become even more challenging. In this work, we conduct the first systematic study on fine-grained detection of IDTEs on Instagram. We propose to take a deep multimodal multilabel learning (DMML) approach to detect IDTEs and demonstrate its effectiveness on a newly constructed dataset called multimodal IDTE(MM-IDTE). Specifically, our model takes text and image data as the input and combines multimodal information to predict multiple labels of illicit drugs. Inspired by the success of BERT, we have developed a self-supervised multimodal bidirectional transformer by jointly fine-tuning pretrained text and image encoders. We have constructed a large-scale dataset MM-IDTE with manually annotated multiple drug labels to support fine-grained detection of illicit drugs. Extensive experimental results on the MM-IDTE dataset show that the proposed DMML methodology can accurately detect IDTEs even in the presence of special characters and style changes attempting to evade detection.
翻译:Instagram和Twitter等社会媒体已成为销售和销售非法药物的重要平台。侦查网上非法药物贩运已成为打击非法药物在线交易的关键。然而,法律状况在空间和时间上往往各不相同;即使同一药物,联邦和州的立法也可能对其合法性有不同的规定。与此同时,更多的贩毒事件被伪装成一种新颖的广告形式,用于评论导致信息差异性。因此,社会媒体对非法药物贩运事件(IDTEs)的准确检测变得更加困难。在这项工作中,我们开展了首次系统研究,对Instagram上的IDTEs进行精密检测。我们提议采用一种深度多式联运多标签学习(DMMMML)方法来检测IDTEs(DMM-IDTE),并展示其对于其合法性的有效性。具体地说,我们的模型将文本和图像数据作为投入,并结合多式联运信息来预测非法药物的多重标签。在BERT的成功启发下,我们开发了一种自我监督的多式联运双向大方向转变的特殊工具。我们提议,通过联合微调IMD(D)前的文本和多面图像检测方法,在IMMMD(MD)的测试中,我们制作了一种精确的测试数据,可以将文本和MDMD(MDMDMDMD)的模拟的模拟的测试数据转化为数据转化为数据显示。