宽图：检测具有广泛依赖关系链图的广告跟踪器

论文标题

宽图：检测具有广泛依赖关系链图的广告跟踪器

Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph

论文作者

Kargaran, Amir Hossein, Akhondzadeh, Mohammad Sadegh, Heidarpour, Mohammad Reza, Manshaei, Mohammad Hossein, Salamatian, Kave, Sattary, Masoud Nejad

论文摘要

网站使用第三方广告和跟踪服务来提供针对性的广告，并收集有关访问它们的用户的信息。这些服务使用户的隐私处于危险之中，这就是为什么用户阻止这些服务的需求正在增长的原因。大多数阻止解决方案都取决于大量用户社区手动维护的人群过滤列表。在这项工作中，我们试图通过通过大规模图组合不同的网站来简化这些过滤器列表的更新，从而连接在大型网站上提出的所有资源请求。该图的功能被提取并用于训练机器学习算法，目的是检测广告和跟踪资源。随着我们的方法结合了不同的信息源，它对使用混淆或改变使用模式的逃避技术更为强大。我们通过Alexa Top-10k网站评估了我们的工作，并发现其准确性为96.1％，而90.9％的精确度和回忆为90.9％。它还可以阻止新的广告和跟踪服务，这可能会被进一步的众包现有过滤器列表所阻止。此外，本文遵循的方法阐明了第三方跟踪和广告的生态系统。

Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the update of these filter lists by combining different websites through a large-scale graph connecting all resource requests made over a large set of sites. The features of this graph are extracted and used to train a machine learning algorithm with the aim of detecting ads and tracking resources. As our approach combines different information sources, it is more robust toward evasion techniques that use obfuscation or changing the usage patterns. We evaluate our work over the Alexa top-10K websites and find its accuracy to be 96.1% biased and 90.9% unbiased with high precision and recall. It can also block new ads and tracking services, which would necessitate being blocked by further crowd-sourced existing filter lists. Moreover, the approach followed in this paper sheds light on the ecosystem of third-party tracking and advertising.

下载PDF全文

下载文献需遵守相关版权规定

论文标题