In the modern Web, service providers often rely heavily on third parties to run their services. For example, they make use of ad networks to finance their services, externally hosted libraries to develop features quickly, and analytics providers to gain insights into visitor behavior. For security and privacy, website owners need to be aware of the content they provide their users. However, in reality, they often do not know which third parties are embedded, for example, when these third parties request additional content as it is common in real-time ad auctions. In this paper, we present a large-scale measurement study to analyze the magnitude of these new challenges. To better reflect the connectedness of third parties, we measured their relations in a model we call third party trees, which reflects an approximation of the loading dependencies of all third parties embedded into a given website. Using this concept, we show that including a single third party can lead to subsequent requests from up to eight additional services. Furthermore, our findings indicate that the third parties embedded on a page load are not always deterministic, as 50% of the branches in the third party trees change between repeated visits. In addition, we found that 93% of the analyzed websites embedded third parties that are located in regions that might not be in line with the current legal framework. Our study also replicates previous work that mostly focused on landing pages of websites. We show that this method is only able to measure a lower bound as subsites show a significant increase of privacy-invasive techniques. For example, our results show an increase of used cookies by about 36% when crawling websites more deeply.
翻译:在现代网络中,服务供应商往往严重依赖第三方管理服务。例如,他们利用广告网络为服务提供资金,外部托管图书馆快速开发功能,分析供应商了解访问者的行为。为了安全和隐私,网站所有者需要了解他们提供用户的内容。然而,在现实中,他们往往不知道哪些第三方嵌入其中,例如,当这些第三方要求增加在实时广告拍卖中常见的内容时,这些第三方通常要求增加内容。在本文中,我们提出一个大型测量研究,分析这些新挑战的规模。为了更好地反映第三方的关联性,外部托管图书馆迅速开发特征,分析供应商之间的关系,以了解访问者的行为。为了安全和隐私,网站所有网站所有第三方的负荷依赖性近似地嵌入一个特定网站。我们利用这个概念表明,包括单一第三方在内的第三方可以导致随后提出多达8项额外服务的请求。此外,我们的调查结果表明,在页面中嵌入的第三方并不总是具有确定性,因为第三方树木的50%是反复访问中变化的范例。此外,我们发现,我们用一个模型衡量它们的关系是第三方的模型,显示,我们目前的法律网站中93%的缩入部分可能显示,我们的法律缩入区域。