We present a framework for web-scale archiving of the dark web. While commonly associated with illicit and illegal activity, the dark web provides a way to privately access web information. This is a valuable and socially beneficial tool to global citizens, such as those wishing to access information while under oppressive political regimes that work to limit information availability. However, little institutional archiving is performed on the dark web (limited to the Archive.is dark web presence, a page-at-a-time archiver). We use surface web tools, techniques, and procedures (TTPs) and adapt them for archiving the dark web. We demonstrate the viability of our framework in a proof-of-concept and narrowly scoped prototype, implemented with the following lightly adapted open source tools: the Brozzler crawler for capture, WARC file for storage, and pywb for replay. Using these tools, we demonstrate the viability of modified surface web archiving TTPs for archiving the dark web.
翻译:我们提出了一个对黑暗网络进行网络规模存档的框架。虽然暗网络通常与非法和非法活动相关,但它为私人访问网络信息提供了一条途径。这是一个对全球公民有价值的、对社会有益的工具,例如那些在压迫性政治政权下希望获取信息、但又努力限制信息可获性的人。然而,在黑暗网络上几乎没有进行机构存档(仅限于档案.是黑暗网络存在,是一个实时网页档案员)。我们使用地表网络工具、技术和程序(TTPs),并调整它们以用于对黑暗网络进行存档。我们展示了我们框架的可行性,它是一个有证据的概念和范围狭窄的原型,其实施方式有以下简便的开放源工具:用于捕捉的Brrowoughr 爬动器、用于存储的WAC文件以及用于重新播放的Pywb。我们使用这些工具,展示了修改后的表面网络归档TTPs的可行性,用于对黑暗网络进行存档。