Viering, Malte Christian (2022)
Statically Safe Distributed Programming.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00022022
Ph.D. Thesis, Primary publication, Publisher's Version
Text
Statically Safe Distributed Programming v05012022.pdf Copyright Information: CC BY-NC-ND 4.0 International - Creative Commons, Attribution NonCommercial, NoDerivs. Download (2MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Statically Safe Distributed Programming | ||||
Language: | English | ||||
Referees: | Mezini, Prof. Dr. Mira ; Eugster, Prof. Dr. Patrick ; Hu, Dr. Raymond | ||||
Date: | 2022 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xvi, 252 Seiten | ||||
Date of oral examination: | 16 February 2022 | ||||
DOI: | 10.26083/tuprints-00022022 | ||||
Abstract: | The Internet and the services it provides have become an omnipresent part of our lives. Asynchronous distributed systems form the basis of these services. Resiliency in the face of partial failures is an essential requirement for many distributed systems, meaning the systems must continue to function as specified even if several components fail. Ensuring correct behavior, particularly when it comes to failures and asynchrony, makes programming such systems very challenging. Multiparty session types (MPSTs) is a typing discipline for concurrent processes that statically ensures desired properties, such as the absence of message reception errors and deadlocks. These properties can help developers implement correct asynchronous message-passing applications. However, existing MPSTs do not support the specification and verification of partial failure-handling or practical fault-tolerant protocols that handle and recover from partial failures. This fundamentally limits the applicability of MPSTs to asynchronous real-world distributed systems. In this thesis we present our article “A Typing Discipline for Statically Verified Crash Failure Handling in Distributed Systems” [VCE+ 18], which is the first MPST formulation for crash failure handling in asynchronous distributed systems. This work features a lightweight coordinator modeled after widely used systems such as Apache ZooKeeper and Chubby. For this formulation we developed a typing discipline based on MPSTs that supports the specification and static verification of multiparty protocols with failure handling. The model preserves the distributed nature of MPSTs and interacts only with the lightweight coordinator for the purpose of critical decision-making around failure handling. The type system provides subject reduction despite the possibility of failures occurring at runtime. We implemented our formulation as a prototype in Scala, using Apache ZooKeeper for coordination, and used it to implement and verify a distributed logistic regression (LR) model. In the accompanying performance evaluation, the session type distributed LR model has a performance comparable to failure agnostics distributed LR models in the absence of failures. We also present our article, “A Multiparty Session Typing Discipline for Fault-tolerant Event-driven Distributed Programming” [VHEZ21], which combines ideas from the previous model with observations from fault-tolerant middleware systems. This work is the first formulation of MPSTs for practical fault-tolerant distributed programming of asynchronous distributed systems. In this work, we give structure to communication patterns involving asynchronous communication and concurrent failures and integrate the features required to express practical fault-tolerant protocols involving dynamic replacement of failed parties and the retrying of failed protocol segments in the presence of imperfect failure detection (perfect failure detection is impossible in asynchronous distributed systems). Key to our approach is the development of the first model of event-driven concurrency for multiparty sessions to unify the session-typed handling of failures and regular I/O events. Moreover, the characteristics of our model allow us to prove a global progress property for well-typed processes engaged in multiple concurrent sessions. Global progress traditionally does not hold in MPST systems. To demonstrate its practicality, we implement our approach as a toolchain for Scala and use it to specify and implement a session-typed version of the cluster manager (CM) of the widely employed Apache Spark data analytics engine. Our session-typed CM integrates with other vanilla Spark components to give a functioning Spark runtime, i.e., it can execute existing unmodified third-party Spark applications. Measured on an industry-standard benchmark Apache Spark has an average performance overhead below 10% when using our session-typed CM instead of Spark’s default CM, in the absence of failures. The developed MPSTs typing disciplines and prototypes enable the specification and verification of practical distributed applications that handle partial failures. Thus, we enable the verification of desired properties and, in turn, help develop correct distributed applications. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-220222 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Distributed Systems Programming | ||||
Date Deposited: | 15 Nov 2022 08:10 | ||||
Last Modified: | 16 Nov 2022 06:42 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/22022 | ||||
PPN: | 501661026 | ||||
Export: |
View Item |