Liebig, Björn (2018)
Domain-Specific High Level Synthesis of Floating-Point Computations to Resource-Shared Microarchitectures.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
Text
v1.1 Domain-Specific High Level Synthesis for FP Computations.pdf - Published Version Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (12MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Domain-Specific High Level Synthesis of Floating-Point Computations to Resource-Shared Microarchitectures | ||||
Language: | English | ||||
Referees: | Koch, Prof. Dr. Andreas ; Berekovic, Prof. Dr. Mladen | ||||
Date: | 2018 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 13 March 2018 | ||||
Abstract: | Many scenarios demand a high processing power often combined with a limited energy budget. A way to increase the processing power without increasing the power consumption is the use of hardware accelerators. While the implementation of such an accelerator as an application specific integrated circuit comes with very high development costs, reconfigurable logic devices such as FPGAs can lower the development costs and reduce development time, thus shortening time to market. To even further reduce development costs, the development of the circuit itself can be partially automated by applying a technique called high-level synthesis. However, current high-level synthesis approaches have difficulties to handle floating-point computations, especially when it comes to large blocks of floating-point code. The focus in this thesis targets on the efficient implementation of floating-point arithmetic in FPGAs. To improve the performance new FPGA-optimized computing units are developed. This work proposes two new architectures for floating-point fused multiply-adds, and also presents and compares two low-latency dividers based on the Goldschmidt algorithm. The proposed units significantly outperform state-of-the-art in terms of latency. Codes from domains such as control engineering and numerical simulation often contain large loop bodies holding with (tens of) thousands of double-precision floating-point operations. Both academic as well as industrial synthesis tools have great difficulty coping with such input programs. In this thesis, the academic compiler Nymble is extended to Nymble-RS, a branch with the necessary features to handle such large blocks of floating-point code. The proposed techniques integrated in a tool chain that translates convex solvers defined in a domain specific language to hardware. The generated accelerators reach clock frequencies of more than 200 MHz. They exceed the performance of hardware generated by a state-of-the-art high-level synthesis tools by more than 5.7x and offers speed-ups of up to 5.2x over software executing on the 800 MHz Cortex-A9 CPUs used in typical reconfigurable system-on-chips. Furthermore, the developed techniques are used to accelerate bioinformatics simulations defined in CellML language by using C-code as intermediate representation. The generated hardware exceeds the performance of current generation desktop CPUs in most cases, while requiring only 20-30% area on a mid-sized FPGA. Meanwhile, energy savings of up to 96% are reached. |
||||
Alternative Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-73387 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Embedded Systems and Applications | ||||
Date Deposited: | 23 May 2018 11:55 | ||||
Last Modified: | 09 Jul 2020 02:04 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/7338 | ||||
PPN: | 431895708 | ||||
Export: |
View Item |