Agnihotri, Pratyush (2024)
Accurate Performance Modeling for Distributed Stream Processing: Methods for Performance Benchmarking and Zero-shot Parallelism Tuning in Distributed and Heterogeneous Environments.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00028144
Ph.D. Thesis, Primary publication, Publisher's Version
Text
2024-10-10_Agnihotri_Pratyush.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (7MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Accurate Performance Modeling for Distributed Stream Processing: Methods for Performance Benchmarking and Zero-shot Parallelism Tuning in Distributed and Heterogeneous Environments | ||||
Language: | English | ||||
Referees: | Steinmetz, Prof. Dr. Ralf ; Koldehofe, Prof. Dr. Boris | ||||
Date: | 28 October 2024 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xxi, 189 Seiten | ||||
Date of oral examination: | 13 September 2024 | ||||
DOI: | 10.26083/tuprints-00028144 | ||||
Abstract: | Distributed Stream Processing (DSP) systems have emerged as a pivotal paradigm, enabling real-time data analysis using distributed cloud resources. Major Internet companies like Amazon and Google, build on DSP systems for their real-time data workloads. For instance, Amazon provides Apache Flink as a service for implementing DSP workloads. Parallelism is often a desired property of DSP workloads to meet the timeliness and scaling requirements of current applications, necessitating the use of distributed and multi-core cloud resources. However, cloud resources are heterogeneous in nature, which makes understanding the performance of DSP workloads very difficult, as it depends on highly varying resources, i.e., compute, storage, and network. Therefore, (i) understanding the performance and (ii) predicting it for distinct DSP workloads on such heterogeneous cloud environments are both very challenging problems. This thesis solves these two fundamental research challenges by contributing methods for accurate performance modeling of DSP workloads in heterogeneous cloud environments. First, this thesis contributes to methods for performance understanding by proposing PDSP-BENCH, a novel benchmarking system. It tackles three primary challenges of existing work: lack of expressiveness in benchmarking parallel DSP workloads, the need for heterogeneous hardware support, and the need for integration of learned DSP models. Unlike existing systems, PDSP-BENCH enables the evaluation of parallel DSP applications and workloads using both synthetic and real-world applications, offering an expressive and scalable solution. Further, it facilitates the systematic training and evaluation of learned DSP models on diverse streaming workloads, which is crucial for optimizing performance. The extensive evaluation of PDSP-BENCH demonstrates its benchmarking capabilities and highlights the impact of varying query complexities, hardware configurations, and workload parameters on system performance. The key observations of our experiments show the non-linearity and paradoxical effects of parallelism on performance. Second, this thesis contributes to methods on performance prediction and optimization by proposing ZEROTUNE, a novel learned cost model for DSP workloads and an optimizer for parallelism tuning. It provides highly accurate cost predictions while generalizing to (unseen) heterogeneous hardware resources of the cloud. The generalizability of the model is based on transfer learning, the same technique that is used in Large Language Models like ChatGPT. The main idea is to learn from so-called transferable features and parallel graph representation that together enable the model to generalize to unseen DSP workloads and hardware. Our extensive evaluation demonstrates ZEROTUNE’s robustness and accuracy across workloads, various parallelism degrees, unseen operator parameters, and training data efficiency. The evaluations show significant speed-ups with parallelism tuning compared to existing methods. Most notably, our approach has been adopted by Amazon Redshift for query execution time prediction. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-281444 | ||||
Additional Information: | This work has been co-funded by the German Research Foundation (DFG) as part of project C2 within the Collaborative Research Center (CRC) 1053 – MAKI. |
||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 18 Department of Electrical Engineering and Information Technology > Institute of Computer Engineering > Multimedia Communications | ||||
Date Deposited: | 28 Oct 2024 13:13 | ||||
Last Modified: | 30 Oct 2024 06:39 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/28144 | ||||
PPN: | 522518834 | ||||
Export: |
View Item |