论文标题

竞争性数据结构动态

Competitive Data-Structure Dynamization

论文作者

Mathieu, Claire, Rajaraman, Rajmohan, Young, Neal E., Yousefi, Arman

论文摘要

数据结构动力学是使静态数据结构动态的通用方法。它用于几何设置和以Google BigTable和LevelDB(我们的重点)等大数据库中的所谓合并(或压实)策略为幌子。以前的理论工作基于对统一输入的最坏情况分析 - 一次插入一个项目和持续的读取率。实际上,合并策略不仅必须处理批处理插入和变化的读/写比例,而且还可以利用这种不均匀性,以每次输入的基础降低成本。 为了对此进行建模,我们通过两个新的在线设定问题来启动数据结构动态的研究。对于每个,输入是一系列不相交的加权项目集。这些套件一次被揭示。该算法必须用设置盖响应每个算法,以涵盖到目前为止揭示的所有项目。它通过添加一个或多个集合并(可选地删除现有集合)来从上一个封面中逐渐获得盖子。对于每个新组,算法产生的构建成本等于集合中项目的重量。在第一个问题中,目的是最大程度地降低总构建成本加总查询成本,该算法在每次$ t $等于当前的封面尺寸时都会产生查询成本。在第二个问题中,目的是将构建成本最小化,同时随时保持查询成本超过$ k $(一个给定参数)。我们给出了两个变体的确定性在线算法,竞争比分别为$θ(\ log^* n)$和$ k $。后一个比率对于第二个变体是最佳的。

Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs -- insertions of one item at a time and constant read rate. In practice, merge policies must not only handle batch insertions and varying read/write ratios, they can take advantage of such non-uniformity to reduce cost on a per-input basis. To model this, we initiate the study of data-structure dynamization through the lens of competitive analysis, via two new online set-cover problems. For each, the input is a sequence of disjoint sets of weighted items. The sets are revealed one at a time. The algorithm must respond to each with a set cover that covers all items revealed so far. It obtains the cover incrementally from the previous cover by adding one or more sets and optionally removing existing sets. For each new set the algorithm incurs build cost equal to the weight of the items in the set. In the first problem the objective is to minimize total build cost plus total query cost, where the algorithm incurs a query cost at each time $t$ equal to the current cover size. In the second problem, the objective is to minimize the build cost while keeping the query cost from exceeding $k$ (a given parameter) at any time. We give deterministic online algorithms for both variants, with competitive ratios of $Θ(\log^* n)$ and $k$, respectively. The latter ratio is optimal for the second variant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源