Automatic Parameter Tuning for Big Data Analytics Frameworks
In the current era of digital transformation, characterized by exponential data growth and diverse data sources, large-scale data processing is essential for deriving valuable insights from this data deluge. Big data processing frameworks like Hadoop, Spark, and Flink have thereby been developed as essential tools for efficiently handling the massive volumes of heterogeneous data. The performance of these frameworks relies on a vast array of configuration parameters, encompassing aspects spanning memory allocation, resource management, and other critical aspects. However, optimizing these parameters is a complex task due to the high-dimensional parameter space, parameter interdependencies, and the diversity of systems and workloads.
Our project aims to explore and develop an automated and intelligent system that streamlines and expedites the optimization process within major big data processing frameworks, specifically Hadoop, Spark, and Flink. This project encompasses several key components: Evaluation of parameter performance within these big data analytical frameworks; Investigation of state-of-the-art autotuning methods, which encompass both research-based techniques and learning-based methods; Development of an intelligent parameter autotuning system for the parameter optimization in the big data analytics frameworks.
Project Members
- Limeng Zhang
- Shagun Dhingra
- Bo Wu