TU Darmstadt / ULB / TUprints

Large-Scale Content-Based Publish-Subscribe Systems

Mühl, Gero (2002)
Large-Scale Content-Based Publish-Subscribe Systems.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

[img]
Preview
PDF
dissFinal.pdf
Copyright Information: In Copyright.

Download (2MB) | Preview
Item Type: Ph.D. Thesis
Type of entry: Primary publication
Title: Large-Scale Content-Based Publish-Subscribe Systems
Language: English
Referees: Bacon, Ph.D. Jean
Advisors: Buchmann, Prof. Ph.D Alejandro P.
Date: 22 November 2002
Place of Publication: Darmstadt
Date of oral examination: 30 September 2002
Abstract:

Today, the architecture of distributed computer systems is dominated by client/server platforms relying on synchronous request/reply. This architecture is not well suited to implement information-driven applications like news delivery, stock quoting, air traffic control, and dissemination of auction bids due to the inherent mismatch between the demands of these applications and the characteristics of those platforms. In contrast to that, publish/subscribe directly reflects the intrinsic behavior of information-driven applications because communication here is indirect and initiated by producers of information: Producers publish notifications and these are delivered to subscribed consumers by the help of a notification service that decouples the producers and the consumers. Therefore, publish/subscribe should be the first choice for implementing such applications. The expressiveness of the notification selection mechanism used by the consumers to describe the notifications they are interested in is crucial for the flexibility of a notification service. Content-based notification selection is most expressive because it allows to evaluate filter predicates over the whole content of a notification. The advantage in expressiveness compared to channel- or subject-based selection results in increased flexibility facilitating extensibility and change. On the other hand, scalable implementations of content-based notification services are difficult to realize. Indeed, the expressiveness of notification selection must be carefully chosen in large-scale systems, because expressiveness and scalability are interdependent. Hence, the most fundamental problem in the area of content-based publish/subscribe systems is probably the scalable routing of notifications from their producers to their respective consumers. Unfortunately, existing content-based notification services are not mature enough to be used in large-scale, widely-distributed environments. Most existing notification services are either centralized, use flooding, or use simple routing algorithms that assume that each event broker has global knowledge about all active subscriptions. All these approaches exhibit severe scalability problems in large-scale systems. In contrast to that, this thesis concentrates on mechanisms to improve the scalability of content-based routing algorithms and presents more advanced routing algorithms that do not rely on global knowledge. The algorithms presented here exploit similarities between subscriptions by using identity- and covering-tests, and by merging filters. While identity-based routing is a simplified version of covering-based routing, merging-based routing is more advanced because it exploits the concept of filter merging. Furthermore, the idea of imperfect routing algorithms is introduced. The thesis consists of a theoretical and a practical part. The theoretical part presents a formal specification of publish/subscribe systems, a routing framework and a set of routing algorithms, and discusses how the routing optimizations can be broken down to the actual data/filter model. The practical part presents the implementation of the Rebeca notification service which supports advertisements and all the routing algorithms mentioned above. A detailed practical evaluation of the implemented algorithms based upon the prototype is also presented.

Alternative Abstract:
Alternative AbstractLanguage

Today, the architecture of distributed computer systems is dominated by client/server platforms relying on synchronous request/reply. This architecture is not well suited to implement information-driven applications like news delivery, stock quoting, air traffic control, and dissemination of auction bids due to the inherent mismatch between the demands of these applications and the characteristics of those platforms. In contrast to that, publish/subscribe directly reflects the intrinsic behavior of information-driven applications because communication here is indirect and initiated by producers of information: Producers publish notifications and these are delivered to subscribed consumers by the help of a notification service that decouples the producers and the consumers. Therefore, publish/subscribe should be the first choice for implementing such applications. The expressiveness of the notification selection mechanism used by the consumers to describe the notifications they are interested in is crucial for the flexibility of a notification service. Content-based notification selection is most expressive because it allows to evaluate filter predicates over the whole content of a notification. The advantage in expressiveness compared to channel- or subject-based selection results in increased flexibility facilitating extensibility and change. On the other hand, scalable implementations of content-based notification services are difficult to realize. Indeed, the expressiveness of notification selection must be carefully chosen in large-scale systems, because expressiveness and scalability are interdependent. Hence, the most fundamental problem in the area of content-based publish/subscribe systems is probably the scalable routing of notifications from their producers to their respective consumers. Unfortunately, existing content-based notification services are not mature enough to be used in large-scale, widely-distributed environments. Most existing notification services are either centralized, use flooding, or use simple routing algorithms that assume that each event broker has global knowledge about all active subscriptions. All these approaches exhibit severe scalability problems in large-scale systems. In contrast to that, this thesis concentrates on mechanisms to improve the scalability of content-based routing algorithms and presents more advanced routing algorithms that do not rely on global knowledge. The algorithms presented here exploit similarities between subscriptions by using identity- and covering-tests, and by merging filters. While identity-based routing is a simplified version of covering-based routing, merging-based routing is more advanced because it exploits the concept of filter merging. Furthermore, the idea of imperfect routing algorithms is introduced. The thesis consists of a theoretical and a practical part. The theoretical part presents a formal specification of publish/subscribe systems, a routing framework and a set of routing algorithms, and discusses how the routing optimizations can be broken down to the actual data/filter model. The practical part presents the implementation of the Rebeca notification service which supports advertisements and all the routing algorithms mentioned above. A detailed practical evaluation of the implemented algorithms based upon the prototype is also presented.

English
URN: urn:nbn:de:tuda-tuprints-2746
Classification DDC: 000 Generalities, computers, information > 004 Computer science
Divisions: 20 Department of Computer Science
Date Deposited: 17 Oct 2008 09:21
Last Modified: 07 Dec 2012 11:48
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/274
PPN:
Export:
Actions (login required)
View Item View Item