Optimised Grid-Partitioning for Block Structured Grids in Parallel Computing

Junglas, Daniel (2007)
Optimised Grid-Partitioning for Block Structured Grids in Parallel Computing.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

PDF
DanielJunglas2007.pdf
Copyright Information: In Copyright.
Download (2MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Optimised Grid-Partitioning for Block Structured Grids in Parallel Computing

Language:

English

Referees:

Martin, Prof. Dr. Alexander ; Schäfer, Prof. Dr. Michael

Advisors:

Martin, Prof. Dr. Alexander

Date:

21 June 2007

Place of Publication:

Darmstadt

Date of oral examination:

12 December 2006

Abstract:

Simulation of turbulent flows in complex geometries is nowadays usually performed by application of grid-based algorithms on parallel computers. In this approach it is not only important to have a clever discretisation and an appropriate grid. One must also use (or develop) algorithms that are convergent and numerically stable. As a final ingredient to success the different work packages of the simulation must be distributed over the set of available processors. This distribution must be performed so as to optimally exploit the computational resources provided by processors, thereby minimising simulation time. Our work will focus on this last optimisation problem and develop models as well solution algorithms for it. To this end we assume that a block structured as well as a suitable simulation algorithm are fixed and look for an optimal mapping of blocks to processors. We restrict ourselves to block structured grids for two reasons: one the one hand, this class of grids is most widely used in simulation of turbulent flows in complex geometries. On the other hand block structured grids can be partitioned into a relatively small number of blocks and the mapping problem can be restricted to the set of blocks. This last aspect allows application of integer programming methods to find optimal mappings. As opposed to standard approaches in the literature, we not only aim at balancing computational load over the processors, but also consider communication overhead induced by data dependencies between blocks mapped to different processors. The communication model we apply is an exact representation of the restrictions our hardware imposes on inter-processor communication. Seeking to minimise simulation time leads to a highly complex combinatorial optimisation problem the solution of which is the aim of our work. To this end we formulate the problem as integer program. Since it is a new problem that has – to the best of our knowledge – not been investigated in the literature we do not stick with a single optimisation model. Instead, we propose different formulations and consider tradeoffs between them. Finally, we investigate the polyhedra defined by the various integer programs in order to implement the valid and facet-defining inequalities found in Branch-and-Cut algorithms. As we cannot expect to solve large problem instances by integer programming methods in an acceptable amount of time, we also develop several local-search heuristics that produce good solutions in a reasonable amount of time. Computational results show that our models and solution algorithms – both of which are dedicated to a certain hardware model – are highly superior to generic approaches described in the literature. We were thus able to improve the usage of CPU resources during simulation of turbulent flows in complex geometries. Let us finally remark that our approach is not limited to this concrete application of finite-element or finite-volume procedures. Instead it can be applied in any situations where small or block structured grids arise and the hardware model is at least similar to the one assumed in our work.

Alternative Abstract:

Alternative Abstract

Language

Die Simulation turbulenter Strömungen in komplexen Geometrien erfolgt heute meistens durch den Einsatz von gitterbasierten Lösungsalgorithmen auf Parallelrechnern. Dabei kommt es nicht alleine darauf an, eine möglichst geschickte Diskretisierung bzw. ein möglichst geeignetes Gitter auszuwählen oder numerisch stabile und konvergente Algorithmen zu entwickeln. Insbesondere müssen die einzelnen Arbeitspakete der Simulation so auf die gegebenen Prozessoren verteilt werden, dass die Rechnerressourcen optimal ausgenutzt werden, d.h. die Simulation in minimaler Zeit abläuft. Die Arbeit konzentriert sich auf dieses letzte Optimierungsproblem und entwickelt dafür Modelle sowie Lösungsmethoden. Dazu nehmen wir an, dass für die Simulation bereits ein blockstrukturiertes Gitter gewählt wurde und suchen nach einer optimalen Verteilung der einzelnen Gitterelemente auf die vorhandenen Prozessoren. Die Beschränkung auf blockstrukturierte Gitter erfolgt, da dies erstens der in der Simulation turbulenter Strömungen in komplexen Geometrien am häufigsten eingesetzte Gittertyp ist und zweitens Gitter dieses Typs in eine überschaubare Menge von Blöcken zerlegt werden können, sodass zur Berechnung einer optimalen Verteilung des Gitters Methoden der ganzzahligen Programmierung verwendet werden können. Im Gegensatz zu den üblichen Ansätzen aus der Literatur achten wir bei Zuordnung von Arbeitspaketen an Prozessoren nicht nur auf eine gleichmäßige Verteilung der Rechenlast, sondern betrachten auch exakt den Zusatzaufwand, der sich für ein bestimmtes Hardwaremodell durch Datenaustausch zwischen verschiedenen Prozessoren während der Simulation ergibt. Diese Fragestellung führt uns auf ein hochkomplexes kombinatorisches Optimierungsproblem, dessen Lösung das Ziel unserer Arbeit ist. Zu diesem Zweck formulieren wir das Problem als ganzzahliges lineares Programm. Da es sich um ein in der Praxis unseres Wissens bisher nicht untersuchtes Problem handelt, beschränken wir uns nicht auf eine Formulierung, sondern stellen verschiedene Formulierungen vor und wägen diese gegeneinander ab. Schließlich untersuchen wir die durch die ganzzahligen Programme definierten Polyeder, um die daraus gewonnenen Erkenntniss in einem Branch-and-Cut Algorithmus zu implementieren. Da für große Probleminstanzen eine Lösung mit Methoden der ganzzahligen Programmierung nicht zu erwarten ist, entwickeln wir außerdem effiziente Heuristiken, die gute, aber nicht notwendig optimale Zuordnungen in akzeptabler Zeit liefern. Unsere Rechenergebnisse am Schluß der Arbeit zeigen, dass sowohl unsere Modelle als auch die implementierten Verfahren, die speziell auf die vorliegende Hardware abgestimmt sind, den Methoden aus der Literatur weit überlegen sind. Wir konnten so einen Beitrag zur besseren Ausnutzung von Rechnerressourcen bei der Simulation turbulenter Strömungen in komplexen Geometrien leisten. Abschließend bleibt anzumerken, dass unsere Ergebnisse nicht auf diese konkrete Anwendung von Finite-Elemente- oder Finite-Volumen-Verfahren beschränkt sind. Vielmehr lassen sich unsere Algorithmen überall dort einsetzen, wo entweder kleine oder blockstrukturierte Gitter auf einer zumindest ähnlichen Hardware verwendet werden.

German

Uncontrolled Keywords:

blockstrukturierte Gitter

Alternative keywords: