大数据处理中的智能资源管理的细粒度建模和优化

论文标题

大数据处理中的智能资源管理的细粒度建模和优化

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

论文作者

Lyu, Chenghao, Fan, Qi, Song, Fei, Sinha, Arnab, Diao, Yanlei, Chen, Wei, Ma, Li, Feng, Yihui, Li, Yaliang, Zeng, Kai, Zhou, Jingren

论文摘要

生产规模的大数据处理提出了一个高度复杂的资源优化环境（RO），这对于满足分析用户的绩效目标和预算限制至关重要。 RO问题是具有挑战性的，因为它涉及一组决策（分区计数，平行实例在机器上的放置以及对每个实例的资源分配），需要多目标优化（MOO），并且由于必须满足严格的时间限制的大数据系统的规模和复杂性，因此更加复杂。本文提出了一个基于MaxCompute的集成系统，可通过细粒度实例级建模和优化支持多目标资源优化。我们提出了一种新的体系结构，将RO分解为一系列简单的问题，新的细粒度预测模型以及新颖的优化方法，这些方法利用这些模型来在层次MOO框架中提出有效的实例级建议。使用生产工作负载进行评估表明，与当前的优化器和调度程序相比，我们的新RO系统可以同时降低37-72％的潜伏期和43-78％的成本，同时以0.02-0.23的运行。

Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute-based integrated system to support multi-objective resource optimization via fine-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new fine-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level recommendations in a hierarchical MOO framework. Evaluation using production workloads shows that our new RO system could reduce 37-72% latency and 43-78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s.

下载PDF全文

下载文献需遵守相关版权规定

论文标题