Efficient Verification of Fault-Tolerant Message-Passing Protocols

Bokor, Péter (2011)
Efficient Verification of Fault-Tolerant Message-Passing Protocols.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

PDF
thesis.pdf
Copyright Information: CC BY-NC-ND 2.5 Generic - Creative Commons, Attribution, NonCommercial, NoDerivs .
Download (1MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Efficient Verification of Fault-Tolerant Message-Passing Protocols

Language:

English

Referees:

Suri, Prof. Neeraj ; Ölveczky, Prof. Csaba

Date:

1 December 2011

Place of Publication:

Darmstadt

Date of oral examination:

29 November 2011

Abstract:

This thesis deals with efficient formal verification of fault-tolerant distributed protocols. The main focus is on protocols that achieve fault-tolerance using replication in distributed systems. In addition, communication in the distributed system is abstracted using message-passing. However, most of the concepts and solutions discussed in the thesis apply beyond replication-based message-passing protocols.

The outcome of verification is two-fold: Either the verification proves that the system (protocol) satisfies its specification or it returns (or claims the existence) of a counterexample that witnesses that the system (protocol) violates the specification. Both the development and application of fault-tolerant message-passing protocols can benefit from verification. Firstly, these protocols can be complex conceptual designs and hard to implement due to (i) a rich variety of (even malicious) faults that should be tolerated by the protocol and (ii) the concurrency in the distributed system executing the protocol. Therefore, counterexamples returned by verification in the phase of development can help developers to get the conceptual protocol and its implementation right. Secondly, fault-tolerant messages-passing protocols specify strong guarantees such as atomic broadcast or diagnosis of malicious faults. Therefore, the verification of such protocols can avoid failures of highly available systems that build upon these protocols.

The complexity of verification strongly depends on the size and nature of the system. Therefore, verification should be efficient in terms of space, time, and human interaction (ranging from full automation to requiring intuitive human guidance). Due to (i) and (ii), the verification of fault-tolerant message-passing protocols faces with a large problem space. Hence, straightforward verification approaches are inefficient. The thesis enables efficient verification of fault-tolerant message-passing systems in several ways.

The input of verification is a model of the system. The model represents the system and can be of varying resolution, e.g., source code, binaries, or high-level executable pseudocode. The efficiency of verification can significantly differ for different models of the same system. In the first part of the thesis, models of fault-tolerant message-passing protocols are proposed to enable efficient verification.

The proposed models address different aspects of fault-tolerant message-passing protocols. These aspects include (A1) message traffic -- an equivalence is shown between different models of sending, delivering, and consuming messages, which allows to select the model that is most amenable to efficient verification; (A2) fault-model -- a surprisingly simple and sound model of the widely-applied crash fault assumption is proposed allowing efficient verification of systems with crash faults; (A3) symmetries -- an approach for finding symmetries in the model is introduced, which can be exploited by symmetry reduction; and (A4) partial orders -- an equivalent translation of models is proposed to improve on the efficiency of partial-order reduction. Symmetry and partial-order reductions are general tools for efficient verification whose efficiency strongly depends on the model in use.

The second part of the thesis improves on existing verification techniques to enhance their efficiency.

The first technique is partial-order reduction, whose performance is limited by the poor flexibility and usability, and high time overhead of existing implementations. A new implementation of partial-order reduction is presented, called LPOR, that improves on all these features at the same time. To demonstrate the features of LPOR, it is applied to general message-passing systems. Experiments with representative fault-tolerant message-passing protocols justify LPOR's improvements. In addition, LPOR is implemented as an open-source Java library. This implementation is applied in MP-Basset, a model checker for general message-passing systems developed in the context of this thesis.

The second verification technique is automated induction. The efficiency of induction is limited by its general incompleteness, i.e., induction might return spurious (wrong) counterexamples for systems that satisfy the specification. A solution against spurious counterexamples is using lemmas. However, the discovery of lemmas is, in general, hard to automate. The thesis proposes a classification of lemmas in fault-tolerant message-passing protocols to enable their automated discovery. An example protocol is verified using classified lemmas and machine-checked induction proofs. Finally, a new approach, called strengthened transitions, is proposed to combat spurious counterexamples. An application of strengthened transitions for general multi-process systems is also presented.

Alternative Abstract:

Alternative Abstract

Language

Diese Dissertation behandelt effiziente, formale Verifikation von fehlertoleranten, nachrichtenbasierten Protokollen. Der Schwerpunkt liegt dabei auf Protokollen, die Fehlertoleranz mittels Replikation in verteilten Systemen erreichen. Die Kommunikation in verteilten Systemen wird als Versenden von Nachrichten abstrahiert. Jedoch gelten die meisten, in dieser Dissertation vorgestellten Konzepte und Techniken auch über replizierte, nachrichtenbasierte Systeme hinaus.

Verifikation gibt zwei Arten von Ergebnissen aus: Verifikation beweist entweder, dass das System (Protokoll) seine Spezifikation erfüllt oder zeigt ein Gegenbeispiel (bzw.~behauptet dessen Existenz) als Nachweis, dass die Spezifikation verletzt wird. Verifikation findet sowohl in der Entwicklung als auch in der Anwendung fehlertoleranter, nachrichtenbasierter Protokolle Verwendung. Erstens sind diese Protokolle aufgrund (i) der großen Bandbreite von (möglicherweise bösartigen) Fehlern, die vom Protokoll toleriert werden müssen, und (ii) der Nebenläufigkeit im verteilten System, das zur Ausführung des Protokolls verwendet wird, häufig komplex. Daher können die von der Verifikation gelieferten Gegenbeispiele dabei helfen, Protokolle korrekt zu entwerfen und zu implementieren. Zweitens, werden in fehlertoleranten, nachrichtenbasierten Protokollen starke Eigenschaften wie Atomic-Broadcast oder die Diagnose von bösartigen Fehlern spezifiziert. Somit kann Verifikation Ausfälle von hochverfügbaren Systemen vermeiden, die auf solchen Protokollen basieren.

Die Komplexität von Verifikation hängt stark von der Größe und Art des Systems ab. Deshalb ist die Effizienz von Verifikation hinsichtlich Speicherbedarf, Zeit und menschlicher Interaktion (von Vollautomatisierung bis hin zu intuitivem Eingreifen eines Menschen) besonders wichtig. Die oben genannten Gründe (i) und (ii) implizieren einen sehr großen Problemraum für die Verifikation fehlertoleranter, nachrichtenbasierte Protokolle. Einfache, naheliegende Verifikations-Ansätze sind somit häufig ineffizient. Diese Dissertation ermöglicht effiziente Verifikation fehlertoleranter, nachrichtenbasierter Protokolle auf verschiedenen Wegen.

Verifikation benötigt als Eingabe ein Modell des Systems. Das Modell repräsentiert das System mit unterschiedlichem Detaillierungsgrad, wie zum Beispiel Quellcode, Maschinencode oder ausführbarer Pseudocode. Die Effizienz der Verifikation kann signifikant vom verwendeten Modell des Systems abhängen. Im ersten Teil der Dissertation werden Modelle von fehlertoleranten, nachrichtenbasierten Protokollen vorgeschlagen, die eine effiziente Verifikation ermöglichen.

Die vorgestellten Modelle behandeln verschiedene Aspekte von fehlertoleranten, nachrichtenbasierten Protokollen. Diese Aspekte umfassen: (A1) Nachrichtenverkehr - Es wird die Equivalenz von verschiedenen Modellen von Senden, Empfang und Verarbeitung von Nachrichten gezeigt, sodass jeweils das Modell, das für eine effiziente Verifikation am besten geignet ist, verwendet werden kann; (A2) Fehler-Modell - Es wird ein überraschend einfaches und korrektes Modell der häufig verwendeten Crash-Fehler-Annahme eingeführt, das eine effiziente Verifikation von Systemen mit solchen Fehlern erlaubt; (A3) Symmetrien - Es wird ein Verfahren vorgestellt um Symmetrien im Modell zu finden, die von der Symmetry-Reduction-Technik genutzt werden; (A4) Partielle Ordnung - Es wird eine equivalente Transformation von Modellen gezeigt, die die Effizienz von Partial-Order-Reduktion verbessert. Symmetrie- und Partial-Order-Reduktion sind allgemeine Verfahren für effizientere Verifikation, deren Effizienz stark vom gewählten Modell abhängt.

Der zweite Teil der Dissertation verbessert bestehende Verifikationstechniken hinsichtlich ihrer Effizienz.

Die erste Technik ist Partial-Order-Reduktion, deren Effizienz durch mangelnde Flexibilität, Bedienbarkeit und die hohe Zeitkomplexität der bestehenden Implementierungen beschränkt wird. Eine neue Implementierung von Partial-Order-Reduktion, genannt LPOR, wird präsentiert, die alle diese Eigenschaften gleichzeitig verbessert. Um die Eignung dieses Ansatzes zu demonstrieren, wird LPOR auf nachrichtenbasierte Systeme angewendet. Experimente mit repräsentativen, fehlertoleranten, nachrichtenbasierten Protokollen validieren die durch LPOR erzielten Verbesserungen. LPOR ist in Java implementiert und mit Quelltext frei verfügbar. Diese Implementierung ist in MP-Basset integriert, ein Model-Checker für nachrichtenbasierte Systeme, der im Rahmen dieser Dissertation entwickelt wurde.

Die zweite Technik ist automatisierte Induktion. Die Effizienz von Induktion ist dadurch beschränkt, dass sie unvollständig ist, das heißt, Induktion kann für Systeme, die ihre Spezifikation erfüllen, falsche Gegenbeispiele ausgeben. Eine Lösung gegen falsche Gegenbeispiele sind Lemmata. Das Auffinden von Lemmata ist allerdings nur schwer automatisierbar. Diese Dissertation schlägt eine Klassifizierung von Lemmata in fehlertoleranten, nachrichtenbasierten Protokollen vor, um automatisches Finden von Lemmata zu ermöglichen. Ein Beispielprotokoll wird durch klassifizierte Lemmata und automatisierte Induktion verifiziert. Zuletzt wird ein neuer Ansatz, genannt gestärkte Transitionen, vorgestellt, um falsche Gegenbeispiele zu vermeiden. Eine Anwendung von gestärkten Transitionen auf allgemeine Multi-Prozess-Systeme wird ebenfalls erläutert.

German

URN:

urn:nbn:de:tuda-tuprints-28185

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

20 Department of Computer Science > Formal Methods in System Engineering

Date Deposited:

09 Dec 2011 10:57

Last Modified:

09 Jul 2020 00:00

URI: