Statically Safe Distributed Programming

Viering, Malte Christian (2022)
Statically Safe Distributed Programming.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00022022
Ph.D. Thesis, Primary publication, Publisher's Version

Text
Statically Safe Distributed Programming v05012022.pdf
Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs.
Download (2MB)

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Statically Safe Distributed Programming

Language:

English

Referees:

Mezini, Prof. Dr. Mira ; Eugster, Prof. Dr. Patrick ; Hu, Dr. Raymond

Date:

2022

Place of Publication:

Darmstadt

Collation:

xvi, 252 Seiten

Date of oral examination:

16 February 2022

DOI:

10.26083/tuprints-00022022

Abstract:

The Internet and the services it provides have become an omnipresent part of our lives. Asynchronous distributed systems form the basis of these services. Resiliency in the face of partial failures is an essential requirement for many distributed systems, meaning the systems must continue to function as specified even if several components fail. Ensuring correct behavior, particularly when it comes to failures and asynchrony, makes programming such systems very challenging. Multiparty session types (MPSTs) is a typing discipline for concurrent processes that statically ensures desired properties, such as the absence of message reception errors and deadlocks. These properties can help developers implement correct asynchronous message-passing applications. However, existing MPSTs do not support the specification and verification of partial failure-handling or practical fault-tolerant protocols that handle and recover from partial failures. This fundamentally limits the applicability of MPSTs to asynchronous real-world distributed systems.

In this thesis we present our article “A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems” [VCE+ 18], which is the first MPST formulation for crash failure handling in asynchronous distributed systems. This work features a lightweight coordinator modeled after widely used systems such as Apache ZooKeeper and Chubby. For this formulation we developed a typing discipline based on MPSTs that supports the specification and static verification of multiparty protocols with failure handling. The model preserves the distributed nature of MPSTs and interacts only with the lightweight coordinator for the purpose of critical decision-making around failure handling. The type system provides subject reduction despite the possibility of failures occurring at runtime. We implemented our formulation as a prototype in Scala, using Apache ZooKeeper for coordination, and used it to implement and verify a distributed logistic regression (LR) model. In the accompanying performance evaluation, the session type distributed LR model has a performance comparable to failure agnostics distributed LR models in the absence of failures.

We also present our article, “A Multiparty Session Typing Discipline for Fault-tolerant Event-driven Distributed Programming” [VHEZ21], which combines ideas from the previous model with observations from fault-tolerant middleware systems. This work is the first formulation of MPSTs for practical fault-tolerant distributed programming of asynchronous distributed systems. In this work, we give structure to communication patterns involving asynchronous communication and concurrent failures and integrate the features required to express practical fault-tolerant protocols involving dynamic replacement of failed parties and the retrying of failed protocol segments in the presence of imperfect failure detection (perfect failure detection is impossible in asynchronous distributed systems). Key to our approach is the development of the first model of event-driven concurrency for multiparty sessions to unify the session-typed handling of failures and regular I/O events. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions. Global progress traditionally does not hold in MPST systems. To demonstrate its practicality, we implement our approach as a toolchain for Scala and use it to specify and implement a session-typed version of the cluster manager (CM) of the widely employed Apache Spark data analytics engine. Our session-typed CM integrates with other vanilla Spark components to give a functioning Spark runtime, i.e., it can execute existing unmodified third-party Spark applications. Measured on an industry-standard benchmark Apache Spark has an average performance overhead below 10% when using our session-typed CM instead of Spark’s default CM, in the absence of failures.

The developed MPSTs typing disciplines and prototypes enable the specification and verification of practical distributed applications that handle partial failures. Thus, we enable the verification of desired properties and, in turn, help develop correct distributed applications.

Alternative Abstract:

Alternative Abstract

Language

Das Internet und die Dienste, die es bereitstellt, sind ein omnipräsenter Teil unseres Lebens geworden, und asynchrone verteilte Systeme bilden die Basis dieser Dienste. Eine wichtige Eigenschaft von verteilten Systemen ist, dass sie robust mit partiellen Fehlern umgehen. Diese Eigenschaft ermöglicht den verteilten Systemen, trotz Fehlern in mehreren Komponenten, weiterhin den Anforderungen entsprechende Fortschritte zu machen. Das Programmieren von verteilten Systemen wird durch die Notwendigkeit mit partiellen Fehlern umzugehen, sehr anspruchsvoll und fehleranfällig. Insbesondere das Sicherstellen der Anforderungen, unter Berücksichtigung von Asynchronität und möglichen Fehlern, ist eine Herausforderung. Multiparty Session Types (MPSTs) ist eine Klasse von Typsystemen für parallele Prozesse, die wünschenswerte Eigenschaften, wie beispielsweise die Abwesenheit von Empfangsfehlern und Deadlocks, statisch sicherstellt. Daher können MPSTs das Programmieren von verteilten Anwendungen vereinfachen. Bisher existierende MPSTs-Ansätze können nicht auf eine signifikante Klasse von verteilten Systemen angewendet werden, da diese die Spezifikation und Verifikation von Crash-Stopp-Fehlerbehandlung nicht ermöglichen. Weiterhin bieten existierende MPSTs-Ansätze nicht die nötigen Abstraktionen zur Spezifikation und Verifikation von praxistauglichen Protokollen, die partielle Fehler behandeln und tolerieren.

Diese Ausarbeitung stellt unseren Artikel “A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems” [VCE+ 18] vor, der die erste MPSTs Formulierung für Crash-Stopp-Fehlerbehandlung in asynchronen verteilten Systemen ist. Diese Arbeit verwendet ein leichtgewichtiges Koordinatormodell, inspiriert durch die weit verbreiteten Systeme wie Apache ZooKeeper und Cubby. Für dieses Modell haben wir ein MPSTs-Typsystem entwickelt, dass das Spezifizieren und Verifizieren von Protokollen mit expliziter Fehlerbehandlung von partiellen Fehlern ermöglicht. Der Koordinator ist nur in der kritischen Behandlung von partiellen Fehlern involviert, daher bewahrt die Arbeit die verteilte Struktur von MPSTs. Wir zeigen, dass das Typsystem Subject-Reduction erfüllt, trotz des möglichen Auftretens von Fehlern während der Laufzeit. Wir haben einen Prototypen in Scala entwickelt, der Apache Zookeeper als Koordinator verwendet und demonstrieren damit die praktische Einsetzbarkeit unserer MPSTs Formulierung zur Entwicklung und Verifikation von verteilten Anwendungen, die robust mit partiellen Fehlern umgehen können. In der durchgeführten Evaluation hat der entwickelte Prototyp außerhalb der Fehlerbehandlung keinen relevanten Performance Overhead.

Weiterhin stellen wir unseren Artikel “A Multiparty Session Typing Discipline for Fault-tolerant Event-driven Distributed Programming” [VHEZ21] vor, dies ist die erste Formulierung von MPSTs für die praxistaugliche Programmierung von fehlertoleranten verteilten Anwendungen in asynchronen verteilten Systemen. Wir beantworten eine offene Frage im Bereich von Session Types; wie können wir Kommunikationsmustern, die Asynchronität und parallele Fehler beinhalten eine Struktur geben und gleichzeitig die nötigen Funktionen bereitstellen, die für praxistaugliche fehlertolerante Protokolle nötig sind. Diese Protokolle benötigen Funktionen, wie das dynamische Ersetzten von fehlerhaften Teilnehmern, das Wiederholen von Protokollsegmenten, die wegen Fehlern gescheitert sind und das Tolerieren von nicht perfekten Fehlerdetektoren (da eine perfekte Fehlererkennung in asynchronen System nicht möglich ist). Kern unseres Ansatzes ist, dass wir das erste Modell für eine eventbasierte Nebenläufigkeit für MPSTs entwickelt haben. Dieses Modell behandelt normale IO-Nachrichten und Fehlerbenachrichtigungen uniform. Dies ermöglicht es uns, nicht nur eine Vielzahl an Funktionen zu realisieren, sondern ist auch Grundlage für unser Global Progress Resultat, was traditionell nicht in MPSTs gilt. Um die praktische Einsetzbarkeit unseres Ansatzes aufzuzeigen, haben wir eine Toolchain in Scala entwickelt. In dieser haben wir einen Session typisierten Cluster Manager (CM) für Apache Spark entwickelt. Unsere Session typisierter CM interagiert mit Apache Spark Komponenten und ergibt eine funktionierende Apache Spark Laufzeitumgebung. Der CM kann existierende und unmodifizierte Spark Anwendungen, auch von Dritten, ausführen. In einem Standardbenchmark hat Apache Spark, wenn es den Session typisierten CM anstelle des Standard CM verwendet, einen durchschnittlich Overhead von unter 10%.

Die entwickelten neuartigen MPSTs basierten Typsysteme und Frameworks ermöglichen die Spezifikation und Verifikation von praxistauglichen verteilten Anwendungen, die partielle Fehler behandeln. Wir ermöglichen damit eine leichtgewichtige Verifikation, stellen wünschenswerte Eigenschaften sicher und unterstützen damit die Entwicklung von robusten verteilten Anwendungen.

German

Status:

Publisher's Version

URN:

urn:nbn:de:tuda-tuprints-220222

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

20 Department of Computer Science > Distributed Systems Programming

Date Deposited:

15 Nov 2022 08:10

Last Modified: