论文标题
部分可观测时空混沌系统的无模型预测
Dizzy: Large-Scale Crawling and Analysis of Onion Services
论文作者
论文摘要
洋葱服务已有近250万用户,已成为DarkWeb的重要组成部分。仅在过去的五年中,洋葱域的数量就增加了20倍,在2022年1月达到了70万多个独特的领域。由于洋葱服务拥有各种非法内容,它们已成为Darkweb研究的宝贵资源,并且是电子库里调查和威胁智能的组成部分。但是,当今的搜索引擎在很大程度上没有索引这些内容,研究人员必须依靠规模,范围或两者兼而有之的过时或手动收集的数据集。 为了解决这个问题,我们为洋葱服务建造了Dizzy:开源爬行和分析系统。 Dizzy实现了新颖的技术,以大规模探索,更新,检查和分类洋葱服务,而不会压倒TOR网络。我们在2021年4月部署了头晕目眩,并用它来分析超过6330万的洋葱网页,重点关注域操作,Web内容,加密货币使用情况和Web图。我们的主要发现表明,洋葱服务由于其高流量速率而不可靠,具有相对较少的可及域,这些域通常相似且非法,享受不断增长的地下加密货币经济,并且具有相对紧密的图表,但与常规的Web的图相对较大,但在拓扑上有所不同。
With nearly 2.5m users, onion services have become the prominent part of the darkweb. Over the last five years alone, the number of onion domains has increased 20x, reaching more than 700k unique domains in January 2022. As onion services host various types of illicit content, they have become a valuable resource for darkweb research and an integral part of e-crime investigation and threat intelligence. However, this content is largely un-indexed by today's search engines and researchers have to rely on outdated or manually-collected datasets that are limited in scale, scope, or both. To tackle this problem, we built Dizzy: An open-source crawling and analysis system for onion services. Dizzy implements novel techniques to explore, update, check, and classify onion services at scale, without overwhelming the Tor network. We deployed Dizzy in April 2021 and used it to analyze more than 63.3m crawled onion webpages, focusing on domain operations, web content, cryptocurrency usage, and web graph. Our main findings show that onion services are unreliable due to their high churn rate, have a relatively small number of reachable domains that are often similar and illicit, enjoy a growing underground cryptocurrency economy, and have a graph that is relatively tightly-knit to, but topologically different from, the regular web's graph.