Blockchain technology, lauded for its transparent and immutable nature, introduces a novel trust model. However, its decentralized structure raises concerns about potential inclusion of malicious or illegal content. This study focuses on Ethereum, presenting a data identification and restoration algorithm. Successfully recovering 175 common files, 296 images, and 91,206 texts, we employed the FastText algorithm for sentiment analysis, achieving a 0.9 accuracy after parameter tuning. Classification revealed 70,189 neutral, 5,208 positive, and 15,810 negative texts, aiding in identifying sensitive or illicit information. Leveraging the NSFWJS library, we detected seven indecent images with 100% accuracy. Our findings expose the coexistence of benign and harmful content on the Ethereum blockchain, including personal data, explicit images, divisive language, and racial discrimination. Notably, sensitive information targeted Chinese government officials. Proposing preventative measures, our study offers valuable insights for public comprehension of blockchain technology and regulatory agency guidance. The algorithms employed present innovative solutions to address blockchain data privacy and security concerns.
翻译:区块链技术因其透明与不可篡改的特性而备受赞誉,开创了一种新型信任模型。然而,其去中心化结构引发了人们对潜在恶意或非法内容纳入的担忧。本研究聚焦于以太坊,提出了一种数据识别与恢复算法。我们成功恢复了175个常见文件、296张图像以及91,206条文本,并采用FastText算法进行情感分析,在参数调优后达到了0.9的准确率。分类结果显示,存在70,189条中性、5,208条积极以及15,810条消极文本,这有助于识别敏感或非法信息。利用NSFWJS库,我们以100%的准确率检测出7张不雅图像。我们的研究揭示了以太坊区块链上良性内容与有害内容并存的现象,包括个人数据、露骨图像、煽动性言论以及种族歧视。值得注意的是,部分敏感信息针对中国政府官员。通过提出预防性措施,本研究为公众理解区块链技术以及监管机构指导提供了宝贵见解。所采用的算法为解决区块链数据隐私与安全问题提供了创新性方案。