Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. In the past year, DeFi has gained popularity and market capitalization. However, it has also been connected to crime, in particular, various types of securities violations. The lack of Know Your Customer requirements in DeFi poses challenges to governments trying to mitigate potential offending in this space. This study aims to uncover whether this problem is suited to a machine learning approach, namely, whether we can identify DeFi projects potentially engaging in securities violations based on their tokens' smart contract code. We adapt prior work on detecting specific types of securities violations across Ethereum, building classifiers based on features extracted from DeFi projects' tokens' smart contract code. The final logistic regression model achieves a 98.9% F-1 score; the final random forest classifier achieves a 98.6% F1-score. From further feature-level analysis, we find a single feature makes this a highly detectable problem. The high reliance on a single feature means that, at this stage, a complex machine learning model may not be necessary or desirable for this problem. However, this may change as DeFi securities violations become more sophisticated. Another contribution of our study is a new dataset, comprised of (a) a verified ground truth dataset for tokens involved in securities violations and (b) a set of legitimate tokens from a reputable DeFi aggregator. This paper further discusses the potential use of a model like ours by prosecutors in enforcement efforts and connects it to the wider legal context.
翻译:分散金融( DeFi) 是一个金融产品和服务系统,它通过各种供应链的智能合同建立和提供。在过去一年中, DeFi 已经赢得了受欢迎程度和市场资本化。然而,它也与犯罪有关,特别是各类证券违规。 DeFi 缺乏了解客户的要求给政府减少这一空间潜在犯罪带来了挑战。 这项研究旨在发现这一问题是否适合机器学习方法, 即, 我们是否能够发现 DeFi 项目可能根据它们象征的智能合同代码进行证券违规交易。 我们先前在Etheum 发现特定类型的证券违规交易的工作已经获得了普及和市场资本化。 但是, 最终的物流回归模式达到了98.9% F-1分; 最后随机的森林分类者达到了98.6% F1分数。 从进一步的地貌分析中,我们发现一个单一的特征使这个模型类似于一个非常可探测的问题。 高度依赖一个单一的特征意味着,在这个阶段, 复杂的机器学习模式可能没有必要或适宜用于这一阶段。 然而, 最终的回归模型将构成我们一个可靠的数据违约情况。