While the increasing number of Vantage Points (VPs) in RIPE RIS and RouteViews improves our understanding of the Internet, the quadratically increasing volume of collected data poses a challenge to the scientific and operational use of the data. The design and implementation of BGP and BGP data collection systems lead to data archives with enormous redundancy, as there is substantial overlap in announced routes across many different VPs. Researchers thus often resort to arbitrary sampling of the data, which we demonstrate comes at a cost to the accuracy and coverage of previous works. The continued growth of the Internet, and of these collection systems, exacerbates this cost. The community needs a better approach to managing and using these data archives. We propose MVP, a system that scores VPs according to their level of redundancy with other VPs, allowing more informed sampling of these data archives. Our challenge is that the degree of redundancy between two updates depends on how we define redundancy, which in turn depends on the analysis objective. Our key contribution is a general framework and associated algorithms to assess redundancy between VP observations. We quantify the benefit of our approach for four canonical BGP routing analyses: AS relationship inference, AS rank computation, hijack detection, and routing detour detection. MVP improves the coverage or accuracy (or both) of all these analyses while processing the same volume of data.
翻译:暂无翻译