The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.
翻译:互联网已成为现代文明的一个关键组成部分,需要科学探索,类似于努力了解陆地、海洋、空气和空间环境。了解交通的基线统计分布对于科学理解互联网至关重要。调校不同互联网观测站和前哨的数据可以成为深入了解这些分布的有用工具。这项工作将最大的互联网望远镜(CAIDA暗网望远镜)和商业前哨(GreyNoise蜂蜜农场)观测到的源进行对比。这两个地点都没有积极发布互联网流量,并对未经邀请的互联网流量(主要是肉网和扫描仪)提供不同的观测。新开发的GregBLAS超空间矩阵和D4M连带阵列技术能够对这些数据进行重大规模的有效分析。CAIDA的来源非常接近于Zipf-Mandelbrot的分布。CAIDA望远镜中最亮(频率最高)来源的6个月时间段内,在GreyNoise蜂农中心观测中得到一致的观测发现。这种重叠作为来源(减少频率)而作为源的减少,而作为时间周期性水平观测的频率变化源,CAA的准确性分布是比例的概率。