El-Hindi, Muhammad (2023)
Towards Efficient Trustworthy Data Systems.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024480
Ph.D. Thesis, Primary publication, Publisher's Version
Text
dissertation_el-hindi_2023.pdf Copyright Information: In Copyright. Download (6MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Towards Efficient Trustworthy Data Systems | ||||
Language: | English | ||||
Referees: | Binnig, Prof. Dr. Carsten ; El Abbadi, Prof. Amr | ||||
Date: | 7 September 2023 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xx, 222 Seiten | ||||
Date of oral examination: | 24 August 2023 | ||||
DOI: | 10.26083/tuprints-00024480 | ||||
Abstract: | Modern trends like digitization and data ecosystems, accelerated by recent events such as COVID-19, necessitate a shift from isolated data management in silos to more open models in which organizations process and share data across organizational boundaries. This transition, however, spawns interdependencies among organizations and generates unique challenges for data management, including data integrity, auditability, and regulatory compliance. Addressing these novel requirements presents significant challenges for traditional data systems such as database management systems. These systems were designed under the assumptions of a single organization owning and managing data, not considering shared data access by multiple parties. Hence, this dissertation explores the concept and development of trustworthy data systems designed to address the unique demands of managing data across multiple organizations. Nevertheless, creating efficient, trustworthy data systems poses several challenges, which are examined through the lens of three main dimensions: data storage, processing, and benchmarking. In this thesis, we provide an overarching analysis of the requirements of data systems in these areas and dive deep into fundamental building blocks from a performance-centric perspective. The concept of trustworthy data storage is investigated within our novel system, BlockchainDB. It addresses the requirements of data integrity and auditability by leveraging blockchains as a storage backend. However, to mitigate the performance limitations of blockchains and facilitate a user-friendly data interaction, we introduce an additional database layer that utilizes techniques like sharding. To advance the efficiency of trustworthy data storage, we also inspect the performance limitations of Merkle Trees, a key data structure for integrity in many systems such as blockchains. We find that Merkle Trees suffer from significant performance limitations when data is frequently updated and propose techniques to improve both throughput and scalability. Addressing the novel requirements of trustworthy data processing, this work presents the system TrustDBle. By integrating blockchains and secure hardware such as Intel’s Software Guard Extensions (SGX), this system efficiently ensures policy adherence and computational integrity. Recognizing the constraints SGX faces with large data volumes, we propose incorporating only critical components within an enclave, thereby balancing efficiency and integrity. Additionally, the dissertation explores the capabilities and limitations of Intel’s second-generation SGX technology (SGXv2) in supporting data-intensive applications. By doing so, we find that SGXv2 improves upon its predecessor and can handle larger data volumes more efficiently, but new issues like remote NUMA access need attention. This research also extends beyond traditional database workloads and explores trustworthy data processing within a federated learning context, proposing a decentralized parameter server architecture that provides robust privacy protection. In the context of benchmarking trustworthy data systems, this dissertation provides a two-fold contribution. In analogy to the ACID properties, particularly isolation levels, it advocates for declarative expressions of system properties like verifiability, enhancing user understanding and implementation flexibility. Secondly, it introduces a holistic benchmark design for these systems, incorporating traditional performance metrics and novel elements like verifiability and auditability. The research encapsulated in this dissertation paves the way for efficient, trustworthy data management across organizations. The presented insights and techniques create promising opportunities for adoption by the broader data management community and other systems for cross-organizational collaboration. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-244808 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Data and AI Systems | ||||
Date Deposited: | 07 Sep 2023 11:26 | ||||
Last Modified: | 28 Sep 2023 10:23 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/24480 | ||||
PPN: | 51192338 | ||||
Export: |
View Item |