论文标题
使用便宜的瞬态云资源的弹性短暂数据存储
An Elastic Ephemeral Datastore using Cheap, Transient Cloud Resources
论文作者
论文摘要
现场实例是虚拟机,其成本降低了60-90%,可以随时收回,只有短的警告期。现场实例已经被用来大大降低云中处理工作负载的成本。但是,随着突然的先发制人会导致数据丢失,利用现货实例来降低状态云应用的成本更具挑战性。在这项工作中,我们建议利用现场实例降低分布式数据分析应用程序中短暂数据管理的成本。我们专门针对短暂的数据,因为现代分析工作负载中的大量数据的耐用性要求较低。如果丢失,可以通过重新执行计算任务来再生数据。我们设计了一个弹性,分布式的临时数据存储,该数据存储在节点预先抢先警告期间重新分发数据,以透明地处理节点抢先抢先。我们在Apache Crail DataStore之上实现弹性数据存储,并使用各种工作负载和VM类型评估系统。通过利用点实例,我们表明我们可以与使用按需VMS进行数据存储相比,成本低60 \%,而仅将端到端执行时间增加2.1%。
Spot instances are virtual machines offered at 60-90% lower cost that can be reclaimed at any time, with only a short warning period. Spot instances have already been used to significantly reduce the cost of processing workloads in the cloud. However, leveraging spot instances to reduce the cost of stateful cloud applications is much more challenging, as the sudden preemptions lead to data loss. In this work, we propose leveraging spot instances to decrease the cost of ephemeral data management in distributed data analytics applications. We specifically target ephemeral data as this large class of data in modern analytics workloads has low durability requirements; if lost, the data can be regenerated by re-executing compute tasks. We design an elastic, distributed ephemeral datastore that handles node preemptions transparently to user applications and minimizes data loss by redistributing data during node preemption warning periods. We implement our elastic datastore on top of the Apache Crail datastore and evaluate the system with various workloads and VM types. By leveraging spot instances, we show that we can run TPC-DS queries with 60\% lower cost compared to using on-demand VMs for the datastore, while only increasing end-to-end execution time by 2.1%.