Sommer, Lukas (2021)
Programming Heterogeneous Systems with General and Domain-Specific Frameworks.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00019772
Ph.D. Thesis, Primary publication, Publisher's Version
|
Text
Dissertation-LS-20211025.pdf Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (5MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Programming Heterogeneous Systems with General and Domain-Specific Frameworks | ||||
Language: | English | ||||
Referees: | Koch, Prof. Dr. Andreas ; Plessl, Prof. Dr. Christian | ||||
Date: | 2021 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xxviii, 229 Seiten | ||||
Date of oral examination: | 18 October 2021 | ||||
DOI: | 10.26083/tuprints-00019772 | ||||
Abstract: | As chip manufacturing processes are getting ever closer to what is physically possible, the projections made by Moore's Law and Dennard Scaling no longer hold true, and CPU performance has been stagnating over the last decade. At the same time, the performance requirements of many important application areas, ranging from machine learning to scientific computing, are increasing at exponential rates, creating a demand that CPUs cannot satisfy anymore. In order to cater the performance hunger of these applications, computer architects have turned their attention towards heterogeneous systems. By combining CPUs with one or multiple accelerators, architects are seeking to provide the necessary performance through specialization and more efficient forms of parallelism. And while the accelerators have successfully delivered on the promised performance in many cases, programming these heterogeneous systems is becoming increasingly difficult, as developers need to take multiple devices, execution models, and data transfers into account. Over the course of this cumulative dissertation, we investigate two potential solutions to the enormous challenges of heterogeneous systems programming. General programming frameworks such as OpenMP define language constructs that reflect important fundamental computing patterns and allow developers to expose an application's parallelism to the compiler for efficient mapping to the target hardware. Domain-specific programming frameworks, on the other hand, are tailored to a single domain and provide mechanisms to capture the high-level semantics and structure of an application, which is then again mapped to the computational units of the underlying hardware in an efficient fashion. In this thesis, we discuss the merits of both approaches in detail and show implementation examples for both. For general programming frameworks, the selection of the most suitable framework for a class of applications and target platform is a crucial step. Using automotive software development as an example, we perform an implementation study to extensively compare three different frameworks. Based on the findings from this implementation study, we identify a number of key factors to assess the suitability of general programming frameworks for applications and target platforms. One popular general programming framework is OpenMP, and the target offloading capabilities added in recent versions also make it an interesting candidate for targeting FPGAs. To enable the use of OpenMP for FPGA programming, we develop the first-ever prototype for OpenMP target offloading constructs on FPGAs via High-Level Synthesis. Furthermore, we design and implement an execution model and hardware extensions for multi-threaded execution in FPGA accelerators generated through High-Level Synthesis. By combining multi-threaded execution in the generated FPGA accelerators with OpenMP target offloading as programming interface, we do not only significantly reduce idle cycles and improve performance, but also provide an easy-to-use programming interface with intuitive mechanisms for data management. In order to showcase the implementation of a domain-specific programming framework, we develop a compiler for Sum-Product Networks, a class of machine learning models. By implementing compilation flows for CPUs, GPUs and FPGAs, we are able to cover a wide range of heterogeneous system setups and achieve improvements in inference throughput of multiple orders of magnitude compared to the existing Python-based libraries. The implementation of these toolflows, which for CPU and GPU is based on the modern MLIR framework, also illustrates the role compilers play for the future of heterogeneous computing. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-197720 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Embedded Systems and Applications | ||||
Date Deposited: | 29 Oct 2021 12:16 | ||||
Last Modified: | 29 Oct 2021 12:16 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/19772 | ||||
PPN: | 488104955 | ||||
Export: |
View Item |