针对数据系列相似性搜索的大力神

论文标题

针对数据系列相似性搜索的大力神

Hercules Against Data Series Similarity Search

论文作者

Echihabi, Karima, Fatourou, Panagiota, Zoumpatianos, Kostas, Palpanas, Themis, Benbrahim, Houda

论文摘要

我们提出了Hercules，这是一种基于平行树的技术，可在基于大量磁盘的数据系列集合中进行精确的相似性搜索。我们提出了新颖的索引构建和查询回答算法，这些算法利用不同的摘要技术，仔细安排昂贵的操作，优化内存和磁盘访问，并利用现代硬件的多线程和SIMD功能来执行CPU密集型计算。我们使用许多合成和真实数据集以及查询各种难度的工作负载，通过针对最先进的技术进行了广泛的实验评估来证明大力神的优势和鲁棒性。结果表明，赫拉克勒斯的表现比最佳竞争对手快一个数量级（这并不总是相同）。此外，Hercules是唯一在所有方案上胜过优化扫描的索引，包括基于磁盘的数据集上的硬性查询工作负载。本文发表在2022年6月第10卷第15卷第15卷的论文集中发表。

We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets. This paper was published in the Proceedings of the VLDB Endowment, Volume 15, Number 10, June 2022.

下载PDF全文

下载文献需遵守相关版权规定

论文标题