InterestCast: Adaptive Event Dissemination for Interactive Real-Time Applications Zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.) genehmigte Dissertation von Max Lehn, M.Sc. aus Groß-Gerau Tag der Einreichung: 16.12.2015, Tag der Prüfung: 26.02.2016 Darmstadt 2016 — D 17 1. Gutachten: Prof. Alejandro Buchmann, Ph.D. 2. Gutachten: Prof. Klara Nahrstedt, Ph.D. Fachbereich Informatik Datenbanken und Verteilte Systeme InterestCast: Adaptive Event Dissemination for Interactive Real-Time Applications Genehmigte Dissertation von Max Lehn, M.Sc. aus Groß-Gerau 1. Gutachten: Prof. Alejandro Buchmann, Ph.D. 2. Gutachten: Prof. Klara Nahrstedt, Ph.D. Tag der Einreichung: 16.12.2015 Tag der Prüfung: 26.02.2016 Darmstadt — D 17 Bitte zitieren Sie dieses Dokument als: URN: urn:nbn:de:tuda-tuprints-53982 URL: http://tuprints.ulb.tu-darmstadt.de/id/eprint/5398 Dieses Dokument wird bereitgestellt von tuprints, E-Publishing-Service der TU Darmstadt http://tuprints.ulb.tu-darmstadt.de tuprints@ulb.tu-darmstadt.de Die Veröffentlichung steht unter folgender Creative Commons Lizenz: Namensnennung – Keine kommerzielle Nutzung – Keine Bearbeitung 3.0 Deutschland http://creativecommons.org/licenses/by-nc-nd/3.0/de/ Acknowledgments This work would not have been possible without the many discussions, help, and motivating sup- port by several people. First and foremost, I would like to thank my advisor, Prof. Alejandro Buchmann for his uncondi- tional support over all the years and for the freedom he gave me to work on a variety of interesting topics. I am also grateful to Prof. Klara Nahrstedt, who did not only host me and give advise during the time I worked on the initial concepts for this thesis, but also accepted to become my second referee. Further thanks go to the numerous colleagues who accompanied me on my academic career, of whom many became close friends. First, there is the whole DVS group. Particular thanks go to Christof Leng and Wesley Terpstra, who acted as my academic mentors in the early days of my research; the other two peer-to-peer office mates Robert Rehner and Alex Frömmgen; but also Stefan Appel, Sebastian Frischbier, Tobias Freudenreich, and Daniel Bausch who were regularly available for chats and discussions during lunch or coffee. The research projects QuaP2P and MAKI gave me the opportunity to collaborate with many more researchers. Particular thanks go to Christian Groß, who was not only project colleague, but also longtime friend and companion since the beginning of our studies. The QuaP2P project work also laid the foundations for this work; I would like to thank Tonio Triebel and Prof. Wolfgang Effelsberg for their initiating contributions to the Planet PI4 prototype. Of those who helped me with discussions and feedback, I would further like to highlight Boris Koldehofe, Sabrina Müller, and Kai Habermehl. Thanks also go to the several undergraduate students assisting in my research, most notably Marcel Blöcher, who were a great help not only with several implementation tasks. The two DFG-funded projects I worked on would not exist without Prof. Ralf Steinmetz, as well as all other involved PIs. Last not least, I am grateful to my family, Irmi, Thomas, and Robert who always lovingly sup- ported me over the years, and my grandma Elli who had to ask after my progress for such a long time. Finally, loving thanks to Katrin, who greatly supported me with a lot of affection and patience, especially in the final stages of my thesis. i Abstract Many networked applications use push-based many-to-many communication. Especially real-time applications, such as online games and other virtual reality applications, need to synchronize state between many participants under strict latency requirements. Those applications typically ex- change frequent state updates and therefore require an appropriate dissemination mechanism. Centralized and server-based solutions do not minimize latency, since they always need an extra round-trip to the server. In addition, a server infrastructure constitutes a potential performance bottleneck and thus a scalability limitation. Direct communication between event source and des- tination is often latency-minimal but quickly exceeds the capacities especially of poorly connected participants because each one needs to communicate individually with many others. Our proposed solution, InterestCast, provides a decentralized event dissemination mechanism that uses peer-to-peer event forwarding, allowing powerful participants to help weak participants with the event multiplication and dissemination. To obtain forwarding configurations that best fit the current situation and application needs, InterestCast adapts them dynamically and continu- ously during runtime. The application’s needs are passed as utility functions, which determine the utility of events with a given latency for a given interest level. Interest levels serve as an abstrac- tion for the importance of events from a specific source, allowing a more fine-grained prioritization than an all-or-nothing subscription model. This is particularly useful if the importance of updates depends on virtual reality distance or another application-specific metric. InterestCast runs an incremental local optimization algorithm that repeatedly evaluates all pos- sible rerouting operations from the point of view of the respective local node. In each iteration, the best operation is chosen based on the application’s utility functions and a system model that predicts the effects of a given operation. As this optimization process is run on each node indepen- dently, it scales well with the number of participants. The prediction only uses local knowledge as well as information from the local neighborhood in up to two hops, which is provided by a neighborhood information exchange protocol. Our evaluation shows that the results of InterestCast’s distributed optimization are close to the global optima computed by a integer program solver. Computing the optimum for a given situ- ation globally at runtime, however, is infeasible due to its computational complexity, even with a highly simplified network model. In detailed network simulations, we further demonstrate the superiority of InterestCast over a purely direct event dissemination in online gaming scenarios. In comparison with the direct dissemination, InterestCast significantly reduces the traffic of weak nodes and almost quadruples the possible number of participants for the same average delivery latency of high-interest events. ii Zusammenfassung Viele Netzwerkanwendungen verwenden push-orientierte Kommunikation zwischen vielen An- wendern. Insbesondere Echtzeitanwendungen, wie etwa Onlinespiele und andere Virtual-Reality- Anwendungen bedürfen der Synchronisierung des Anwendungszustands zwischen vielen Teil- nehmern unter engen Latenz-Voraussetzungen. Solche Anwendungen tauschen typischerweise regelmäßig Zustands-Updates aus und benötigen einen dafür geeigneten Verteilungsmechanis- mus. Zentralisierte, serverbasierte Lösungen minimieren Latenz insofern nicht, als sie immer einen separaten Round-Trip zum Server benötigen. Hinzu kommt, dass die Serverinfrastruktur einen potentiellen Performance-Engpass darstellt und damit die Skalierbarkeit einschränkt. Direkte Kommunikation zwischen Quelle und Ziel ist meist Latenz-minimal, führt aber schnell zur Über- schreitung der Netzwerkkapazität, insbesondere bei schwach angebundenen Teilnehmern. Denn so muss jeder mit vielen anderen Teilnehmern individuell kommunizieren. Die in dieser Arbeit vorgestellte Lösung, InterestCast, bietet einen dezentralen Mechanismus zur Verteilung von Ereignissen, welcher eine Peer-to-Peer-Weiterleitung von Ereignissen verwendet und so starken Teilnehmern erlaubt, schwachen bei der Vervielfältigung und Verbreitung von Ereignis- Meldungen zu helfen. Um die Weiterleitungskonfiguration zu erhalten, die in der momentanen Situation am besten für die jeweiligen Anforderungen der Anwendung geeignet sind, passt In- terestCast diese zur Laufzeit kontinuierlich an. Die Anforderungen der Anwendung werden als Nutzenfunktionen spezifiziert, die den Nutzen von Ereignissen mit einer gegebenen Ausliefer- ungsverzögerung bewertet, abhängig vom Interesse am jeweiligen Ereignis. Interessensniveaus dienen als eine Abstraktion für die Wichtigkeit von Ereignissen von einer bestimmten Quelle und erlauben eine Priorisierung der Auslieferung von Ereignis-Benachrichtigungen. Dies ist besonders nützlich wenn die Wichtigkeit von Ereignissen z.B. vom Abstand der Teilnehmer in der virtuellen Realität oder von sonstigen anwendungsspezifischen Metriken abhängt. InterestCast führt einen inkrementellen lokalen Optimierungsalgorithmus aus, der wieder- holt alle möglichen Veränderungen der Weiterleitungen aus Sicht des jeweils lokalen Knotens auswertet. In jedem Durchlauf wird anhand der Nutzenfunktionen und einem Systemmodell die Operation ausgewählt, von der die größte Verbesserung des Gesamtzustands erwartet wird. Da dieser Optimierungsprozess auf jedem Knoten unabhängig ausgeführt wird, skaliert er gut mit der Teilnehmerzahl. Die Vorhersage verwendet nur lokales Wissen sowie Informationen aus der Nach- barschaft in bis zu zwei Schritten, welche über ein entsprechendes Austauschprotokoll gewonnen werden. Unsere Auswertungen zeigen, dass die Ergebnisse der verteilten Optimierung von InterestCast nahe an den globalen Optima liegen, welche wir anhand der verwendeten Modelle mit Hilfe eines Solvers für Integer-Programme ermitteln. Die Lösung des globalen Optimums zur Laufzeit ist jedoch unpraktikabel, da die Komplexität des Problems selbst mit einem stark vereinfachten Net- zwerkmodell zu äußerst langen Berechnungszeiten führt. In detaillierten Netzwerksimulationen zeigen wir außerdem die Überlegenheit von InterestCast über eine rein direkte Verteilung der iii Ereignisse in Online-Spiel-Szenarien. Im Vergleich mit der direkten Verteilung reduziert Inter- estCast die nötige Datenmenge bei schwachen Knoten erheblich und vervierfacht beinahe die mögliche Teilnehmerzahl bei gleichbleibender durchschnittlicher Auslieferungszeit von Ereignis- sen von hohem Interesse. iv Zusammenfassung Contents 1. Introduction 1 1.1. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Methodology and Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Background: Applications and Use Cases 9 2.1. Online Gaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2. Mobile Augmented Reality Gaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3. Robotics and Vehicular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4. Air Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5. Requirements and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3. State of the Art 21 3.1. Interfacing Event Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1. IP Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2. Application-Layer Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3. Delay Optimization in Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4. Publish/Subscribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5. Adaptive Overlay Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.6. Interest Management and Application-tailored Multicast . . . . . . . . . . . . . . . . . 30 4. Approach and Design Decisions 37 4.1. Decomposition and Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.1. Decomposing Interest Management and Event Dissemination . . . . . . . . . 38 4.1.2. Interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2. Adaptability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.1. Network Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.2. Application Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.3. Utility vs. Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2.4. Composition of the Target Function . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.5. Specifying and Updating Utility Functions . . . . . . . . . . . . . . . . . . . . . 46 4.3. Topology and Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.1. Topology Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 v 4.4. Optimization Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.1. System Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4.2. Optimization Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5. System Model and Problem Formalization 53 5.1. Basic System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1.1. Event Delivery Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2. Event Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2.1. Update Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2.2. Instant Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3. Application Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.4. Performance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.4.1. Queuing Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.4.2. Update Event Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.5. Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.6. Global Solutions Using Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . 61 5.6.1. ILP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.6.2. MINLP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6. Incremental Optimization Algorithm 65 6.1. Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2. Utility Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2.1. Link Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2.2. Path Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.3. Utility Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3. Neighbor Information Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4. Measuring Network Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4.1. Link Usage and Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4.2. Route Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.4.3. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5. Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5.1. Latency Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.5.2. Bandwidth Demand Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5.3. Utility Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7. Prototype 81 7.1. High-Level Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2. InterestCast Components and Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3. Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3.1. Route Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.3.2. Route Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.4. Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 vi Contents 7.5. Aggregation and Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.6. Protocol Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8. Evaluation Platform 95 8.1. The Game Planet PI4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.2. The Planet PI4 Evaluation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.2.1. Workload Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.2.2. Game Network Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.2.3. System and Network Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.3. Experiment Workflow and Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9. Evaluation 103 9.1. Evaluation Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 9.2. Important Factors and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.2.1. Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.2.2. Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.3. Graph-Based Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9.3.1. Virtual Reality Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.3.2. The Basic Optimization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.3.3. Comparison with Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . 110 9.3.4. Node Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.3.5. Utility Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.3.6. Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.4. Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.4.1. Node Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.4.2. Interest Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.4.3. Traffic Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 9.4.4. Planet PI4 Bot Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 9.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.Conclusion and Outlook 129 10.1.Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 A. Global Optimization Integer Programs 133 A.1. ILP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2. MINLP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 B. InterestCast Routing Update Handling 139 C. Prototype Evaluation Results 143 Contents vii Simply Explained: Latency. Oliver Widder, http://geek-and-poke.com/, CC BY 3.0 viii Contents http://geek-and-poke.com/ 1 Introduction There is an old network saying: Bandwidth problems can be cured with money. Latency problems are harder because the speed of light is fixed—you can’t bribe God. David Clark The major trends of the Internet in the 1990s and 2000s have been on the globalization of data, making services available independently from the consumer’s location. This led to the huge success of the Internet. The assumption, however, was that applications could work with moderate net- work bandwidths and be mostly insensitive to network latencies. With its success, more and more demanding applications became part of the Internet. New web technologies enabled new classes of interactive applications such as Google Docs1, more and more business logic with increasing real- time demands is run over the Internet (e.g., real-time business intelligence [12]), and applications based on virtual environments, especially large real-time online multiplayer games have become ever more popular [160]. In contrast to most conventional Internet applications like newsgroups, the web, or web services, which use mostly pull-based communication, interactive and real-time applications need push-based communication schemes to inform participants about events as they happen. Further, interaction is often multilateral, in contrast to applications like one-to-one chats, voice, or video calls. Certain events or actions are relevant for a potentially large group of users, and each user may have a unique set of interests. Online games are in particular prototypical for this set of properties. From the network perspective, those applications have increasing demands in terms of network bandwidths as well as latencies. Bandwidth increases due to network developments and large scale investment and is, therefore, becoming less of an issue nowadays, at least in fixed networks of well-developed regions [58]. Latencies, however, remain limited by the speed of light. There are ambitious initiatives to bring latencies close to what is possible at the speed of light on lin- ear distance, e.g., using microwave links [145]. Such approaches, however, are associated with high costs and therefore only available to selected services, like high-frequency trading [39]. An alternative approach is to bring the services closer to the consumer. One of the first and largest deployments of such an approach was made by Akamai [126], bringing static content close to the consumer with the goal of a faster, more reliable, and more efficient delivery. Such solutions are application-specific in that the distribution of service functionality must be selected in accordance with application needs. Static content, for instance, can be arbitrarily replicated, while the direct interaction between two users is still constrained by the network latency between them. 1 Google Docs. http://www.google.com/docs/about/ 1 http://www.google.com/docs/about/ If interacting parties are geographically distant from each other, the possibly significant speed- of-light latency is inevitable. On the other hand, if they are close, geographically or in terms of network distance, there is a huge potential for making their communication more direct. Indeed, many applications, even though they are globally available, have a large ratio of regionalized communication. One example are online social networks, where locality refers to the network of friends. News or posts from close sources in terms of the social graph tend to have more impor- tance and are treated with a higher priority [89]. The algorithms behind online social networks therefore mimic the natural human behavior regarding information valuation. The increasing us- age of mobile Internet services also results in a trend to information locality, most explicitly in the form of location-based services [142]. Another example for localized communication are machine-to-machine communication net- works, such as wireless sensor networks or vehicular (ad-hoc) networks. Having started with communication technologies and protocols tailored for their specific application needs, these ap- plication areas increasingly adopt Internet technologies. Using Internet protocols, all the billions of devices can theoretically talk to each other, which is part of what the ‘Internet of Things’ [165, 8] and the ‘Internet of Everything’ [55] stand for. Actual applications, however, often need localized communication. A brightness sensor most likely adapts local lamps, not those on the other side of the planet. Further, machine-to-machine communication often has to deal with high-frequency event streams. Such events may have small payloads, but sending each event individually using Internet protocols may induce a significant communication overhead due to protocol headers. An application class with similar requirements are interactive real-time applications, such as online games, or more generally multiuser virtual environments. To keep the users’ views syn- chronous, the clients regularly exchange update events, often several events per second. Interac- tive real-time applications further depend on low event delivery latencies. Their scale reaches from a handful of participants up to several ten thousands [53], and the distribution of users ranges from a single room to worldwide. Most of today’s online services rely on server infrastructures acting as mediators between the participating clients. This has the benefits of a central authority and maintenance, as well as lightweight client-side applications. With cloud infrastructures, server management became cheaper and scalable, as virtual servers can be dynamically allocated in large data centers. The cloud economy is based on a small number of such large data centers. Although large operators maintain multiple data centers in different parts of the world, the typical distance to the user is still between a few hundred to several thousand kilometers [107]. Amazon, one of the largest cloud service providers, for instance, has at the time of writing eleven data centers worldwide [5]. While many use cases still require central synchronization, e.g., for data consistency or account- ing, interactive applications can still profit from unmanaged high-frequency, low latency updates. Consider a real-time multiplayer game where players navigate and interact in a common virtual world with their avatars. Awareness of the activities of their avatars’ surroundings is an impor- tant factor for a good gaming experience. Therefore, a high view synchronicity is desirable, which generally requires a high frequency and low latency of position updates among players. 2 1. Introduction Virtual/ interest space Physical/ network space A A B B C D C D Figure 1.1.: Illustration of the physical space and virtual space in which participants are located. The virtual space is defined by the application and determines the interest of the par- ticipants in each other. The physical space is determined by the physical location of the participants’ devices and the connecting network. In some cases, the virtual and phys- ical locations correlate (participants A and B), while in other cases, they are completely independent (C and D). We therefore argue that the dissemination of such time-critical update events should be done using direct communication where suitable. Despite its potential for significantly lowering dis- semination latency, aiming for direct communication results in several challenges that must be overcome. Their composition of properties makes online games a well-suited use case for this thesis. 1.1 Challenges Our targeted applications have a push-based many-to-many communication pattern. This means that each participant’s updates are of interest to multiple other participants and vice versa that each participant is interested in the updates of multiple others. Considering only network latencies, delivering each event as a direct message from source (i.e., participant of interest) to destination (i.e., interested participant) is the fastest option in most cases. This approach becomes more and more suitable with today’s bandwidths. Nevertheless, we have to be able to deal with nodes whose bandwidth is the limiting factor. As soon as the spare bandwidth becomes scarce, queuing delays can significantly increase the total delivery latency, and especially weak nodes may not be able to serve all destinations at all. This problem is amplified by the fact that event messages, which are transmitted frequently but often have only small payload data, are inflated by network protocol header overhead. As a consequence, the effective net bandwidth in terms of event payload is diminished. Application-level multicast approaches can mitigate this problem by using stronger nodes as message forwarders and multipliers. By taking network latencies into account, the additional la- tency introduced by the extra hop(s) can be minimized. Figure 1.1 illustrates the mapping between physical (network) and virtual (interest) space. While the latter determines who needs to commu- nicate with whom, the former determines the latencies between the participants. Both need to be 1.1. Challenges 3 taken into account to optimize latencies depending on interest in the virtual world based on the network conditions in the physical world. Most existing multicast solutions do not consider multi-source deployments. Although a multi- source problem can be modeled as multiple single-source problems, this does not take into account the potential for aggregating multiple messages from different sources to reduce the relative net- work packet header overhead. Other approaches, such as distributed publish/subscribe broker networks, assume the available additional infrastructure. In addition, there is a delay due to the publish/subscribe mechanism that relies on multicast or in the worst case on sequential unicast. To keep the solution cheap and easily deployable, we strive for a decentralized and self-managing solution. From the application perspective, on the other hand, there can be interest gradations in the sense that update events from a particular source are more important for some receivers than for others. An example are virtual spaces in which the perception of events depends on distance. While actions in the close proximity of a participant are critical, peripheral events may be of less importance. This should be considered by the dissemination solution. Since different applications have different needs in this respect, the application should be able to determine the goal for which the dissemination is optimized. Such option is not available in existing multicast systems, especially in decentralized solutions. On the other hand, there are special-purpose approaches for multiplayer online games and virtual environments that incorporate mechanisms for the differentiation of fidelity levels for neighbors, e.g., depending on their distance and interaction. Those, however, are not generalizable for different applications. Additionally, the distribution of load among participants should be fair. Fairness can be mea- sured in different ways, and existing systems considering fairness usually select some more or less reasonable way for measuring and optimizing fairness. Different applications may have different goals with respect to load distribution. Therefore, the application should also have means for in- fluencing the way the system distributes load. This includes the option of trading the load of the participants for the performance of the system. Finally, using best effort networks like the Internet, the system must cope with varying network conditions. The solution should, therefore, be adaptive to both the network conditions and the application load. A promising option for tackling adaptiveness with respect to both application requirements and network conditions in a generalizable way is the use of utility functions. They allow the application to define high-level utility indicators without the need for dealing with low-level decisions and thereby leaving the system room for self-optimization. Furthermore, they allow the consideration of costs on the underlying system, i.e, the network and node resource usage. Though being powerful, for an effective application of utility functions, several challenges have to be overcome. The first is to find an appropriate specification level. Applications should only be faced with the parameters that are relevant on the application level. If the system internally deals with lower-level parameters, there should be a translation to application-level metrics. Ideally, there should be no need to provide heuristics in the utility functions. Instead, the system model should be capable of predicting the system response in terms of the utility function well enough 4 1. Introduction so that the optimization goal can be directly encoded in the utility function(s). Finally, in our case, the system optimization needs to be performed in a distributed setting. In a system with decentralized control, there is no entity with full knowledge about the system state. The utility prediction, therefore, has to work with the limited knowledge available locally. 1.2 Problem Statement In this thesis, we focus on application use cases requiring timely many-to-many event dissemination. Many-to-many refers to the fact that each participant is both an event producer and consumer. Further, each participant has its own individual interest set, making conventional group commu- nication inefficient. Since latency is critical for interactive applications, it is the core optimization objective with respect to application performance. Building on top of Internet infrastructure only allows best-effort guarantees, a fact that is considered throughout this work. Besides performance, it is necessary to consider capabilities and costs of the underlying network infrastructure. We assume bandwidth availability and link utilization as the main factors that determine the costs. Latency can be minimized using direct connectivity among participants. Hence, this option should be preferred where feasible. Heterogeneous connectivity and device capabilities, how- ever, limit the possible direct fan-out. To mitigate this limitation, participants can help each other by forwarding and multiplying each other’s event messages. This distributes load amongst the participants and also yields savings in data traffic due to aggregation. A dynamic trade-off between the two goals, latency and traffic minimization, is a core objective of this thesis. Such a dynamic trade-off calls for an adaptive solution based on both application demands and network environment capabilities. Based on the above challenges, we derive the following set of research questions to be addressed in this thesis. • We have identified the need for an efficient many-to-many event dissemination in decentral- ized settings. Therefore, the top level question addressed here is: How can a decentralized many-to-many event dissemination be solved efficiently and with a particular regard to dissemination latencies? • To evaluate and improve possible solutions, it is necessary to identify the criteria based on which the goodness can be quantified. This is of particular importance for a self-optimizing solution. Hence, the question is: What are the goodness criteria to be considered for the targeted class of applications? • Application-level utility functions promise to be a flexible way for defining application needs for an adaptive system. In the context of the targeted application the following research question is posed: How can application demands and adaptivity goals for a many-to-many dissemination infras- tructure, such as interest gradations and bandwidth-latency trade-offs, be formulated as utility functions? 1.2. Problem Statement 5 • From the engineering perspective, there is the need to define an appropriate application interface, based on the goals and the requirements identified in the application use cases. The corresponding research question is: What is a suitable decomposition and interfacing for a many-to-many event dissemination mech- anism providing application-defined optimization goals? • Having the goal of providing a solution that requires no central coordination, the utility-based optimization has to be able to run in a decentralized setting: Can the optimization of the dissemination topology be effectively performed with a local op- timization algorithm, i.e., by evaluating the utility function on each node based on its local knowledge? • Given the targeted application scenarios, there are opportunities to leverage properties spe- cific to those applications. Most importantly: How can the clustering of interest be exploited to reduce bandwidth usage with little impact on latency? 1.3 Approach We introduce InterestCast, a decentralized opportunistic many-to-many message dissemination middleware, following the introduced concepts. InterestCast introduces the concept of interest levels, which determine the importance of update events of one participant to another. Setting interest levels is comparable to subscriptions in publish/subscribe terminology, but adds the no- tion of priorities. This concept further provides an interface for decomposing existing overlays for networked virtual environments. InterestCast adapts to both the application load and the underlying network properties (Fig- ure 1.2). Further, it allows the application to specify utility functions, giving the application control over the desired trade-offs. While the first two factors are determined at runtime, the latter pro- vides variability at design time. The utility functions consider event dissemination metrics, most importantly their end-to-end latency, as well as the nodes’ load in terms of link utilization. The application specifying the utility functions can benefit in several ways: varying application require- ments can be expressed as utility function changes, the utility functions have an influence on the load and performance distribution among nodes and therefore the overall fairness, and they allow tuning the trade-off between performance (low latencies) and cost (bandwidth usage). For the adaptation, InterestCast runs a continuous and incremental local optimization algorithm of message forwarding rules between nodes. All participating nodes repeatedly evaluate and rank possible optimization operations based on their local views and using the application-defined utility functions. In each iteration, the option with the highest expected change in utility is executed, provided the change in utility is positive. Finally, to minimize message overhead and therefore maximize the effective bandwidth while keeping delays low, a message scheduling and aggregation mechanism reduces the transmission of redundant data. 6 1. Introduction Network InterestCast Middleware Application Utility functions Capabilities, utilization Workload, interest levels Figure 1.2.: InterestCast’s optimization acts upon three main factors: application-defined utility functions, the application workload, and the network infrastructure capabilities and utilization. 1.4 Methodology and Thesis Outline To set the detailed goals of this thesis, we must first identify the possible target applications. As introduced, the main target application group in this thesis are real-time online games due to their relatively clear set of requirements and conditions. We do, however, consider further application groups: the special case of mobile and augmented reality games, robotics and vehicular networks, as well as air traffic control. Those applications and their particularities are analyzed in Chapter 2. Based on this, the set of requirements is identified. In Chapter 3, the state of the art of the related work is discussed. Starting with a general introduction to interfacing approaches for event dissemination, publish/subscribe and different multicast incarnations are compared. We further look into specific approaches to adaptive overlays, delay optimization, and peer-to-peer gaming overlays. Based on the identified requirements, Chapter 4 derives the main design principles for the solu- tion proposed in this thesis. They include basic design decisions such as component decompositions and adaptability approaches, as well as more detailed discussions on topology, routing, and their optimization. Subsequently, we define the formal system model in Chapter 5. This model provides the nec- essary abstraction for analyzing and predicting current and future system state. The chapter fur- thermore carves out the underlying optimization problem. The global optimization problem is formulated as an integer programming problem to be solvable from a global point of view using a standard solver. This allows a comparison between the distributed solutions using local knowledge and the global optimum. InterestCast incremental optimization algorithm is described in Chapter 6. This includes the basic local optimization algorithm as well as the local utility estimation. The chapter further elaborates on the various options and considerations for selecting appropriate utility functions and presents possible utility functions. 1.4. Methodology and Thesis Outline 7 To be able to evaluate and fine-tune InterestCast under realistic network conditions, a proto- type is developed, which is described in Chapter 7. The prototype contains the full software stack including connection management, routing, monitoring, optimization, and the application-level interface. The prototype implementation shows the effective cost of the necessary local knowl- edge, effects of real networking, overhead, and aggregation, and can be used as a basis for real applications. The prototype implementation is integrated into the Planet PI4 evaluation platform, which is described in Chapter 8. The platform’s core is the game Planet PI4, which serves as a workload generator. Alternatively, it provides a workload generation using abstracted mobility models. It allows the execution of the prototype on different network simulators as well as on a real network. It further provides several testing and evaluation facilities like logging, tracing, and analysis tools. Chapter 9 presents and discusses the evaluation results. We start with a graph-based implemen- tation based on an abstracted network model, showing the basic behavior and the comparison with the global solutions. Subsequently, a more detailed analysis is performed based on the prototype. Finally, Chapter 10 discusses the overall results and concludes this thesis. 8 1. Introduction 2 Background: Applications and Use Cases This chapter introduces the main target applications and use cases that set the goals of this thesis. We discuss their relevant properties and implications for this work. The set of applications shown here is not intended to be exhaustive, but exemplifies the need for low latency many-to-many communication. We begin with and put most emphasis on the application domain of online gaming, since this is the primary scenario throughout this work. Mobile and augmented reality gaming is then intro- duced as a special case of online gaming. Afterwards we briefly examine the application domains of robotics and vehicular networks as well as air traffic control systems. The chapter concludes with a list of requirements and challenges derived from the discussed applications, which serves as a basis for the design of InterestCast. 2.1 Online Gaming Since the emergence of the first computer games in the 1970s, there has been a long and successful story of advancements, which has led to a ubiquity of computer games in our lives, driven by a multi-billion-dollar industry [7]. Although computers enable attractive single-user games, which in fact make a significant part of the games market, multiplayer gaming has been a compelling factor from the beginning. Like in conventional games, their main aspect is the competition between human players. Multiplayer games started as dedicated arcade machines, such as Atari’s popular Pong [87, 121]. Games for two players were usually played using two controls on the same machine with a shared screen. Home computers adopted this concept, as do modern game consoles. With the prevalence of computer networks, however, computer games made use of them by distributing the games across multiple computers, providing each player a dedicated device. Early approaches for desktop PCs used direct serial cables or modem connections between two machines. Later came local networks, e.g., using Ethernet. The Internet created new opportunities by bringing thousands, later millions, and now billions of potential players together. This is when online games grew large. Online games range from single- digit to five-digit numbers of simultaneous players playing in a single virtual universe. Games with small and short-lived sessions often use one participant as the master (or server) node. The selection usually happens explicitly in that the player who opens the game session runs the master node. Alternatively, all nodes replicate and process the whole game state and synchronize in a peer-to-peer fashion. Large and long-lasting games, such as massively multiplayer online games (MMOG), typically use dedicated server infrastructures. MMOG worlds are persistent and continuously active, and players 9 can join and leave at any time. EVE Online1 is an MMOG with one of the largest game worlds. Its number of simultaneous online players regularly exceeds 50,000 in a single universe [53]. The broader technical term also used in this context is the networked virtual environment (NVE), which includes, in addition to online games, social virtual worlds and military simulations [28]. We use the term virtual world as the simulated content of the game or virtual environment. In most games, the perception of the virtual world for an individual participant is limited to her immedi- ately surrounding environment. In the simplest case, this is modeled by a circular or spherical area of interest (AOI), limited by the vision range (VR); sometimes, these are also referred to as aura, focus, and nimbus [19]. The responsibilities of the game server can be coarsely decomposed in collecting updates from the clients, managing (persistent) game objects, selecting relevant game updates for each player (interest management), and delivering game updates to each player (event dissemination).2 The largest part of the network traffic is caused by position updates [35, 34], which in their most basic form contain the current position of the player. To keep the views of all players synchronous, it is necessary to exchange those updates regularly. Typical update frequencies are between 5 and 10 Hz [35, 34, 151]. To achieve a high degree of synchronicity among the views of each player and a high reactivity to user actions, a low dissemination latency of update messages is crucial [44, 43]. While for relatively slow-paced role-playing games, latencies of one second can still be acceptable [61], player performance of fast-paced shooter games starts suffering from latencies above 100ms [18] or even less [134]. Omitting the intermediate server for the events between clients, i.e., passing update messages directly between the clients, therefore, helps reduce end-to-end (i.e., client-to- client) latencies. This is where peer-to-peer approaches come into play. For this purpose, it is unnecessary to fully replace the game server(s) using peer-to-peer tech- nology. Aspects requiring central control, such as authorization, persistent storage, or conflict resolution can remain server-supported, while opportunistic low-latency updates are disseminated in a peer-to-peer fashion. In the past one and a half decades, research has resulted in a variety of peer-to-peer approaches for the different aspects. An overview on contributions specific to interest management and event dissemination can be found in Section 3.6 of this work. Yahyavi and Kemme [167] provide a recent survey covering a broader range of peer-to-peer online gaming aspects, which go beyond the scope of this thesis. Packet Overhead As indicated above, online games use frequent update messages to keep the game state syn- chronous among participants. Since these messages only need to contain metadata, e.g., player position or object states, they are typically small. Measurements on network traces of real online games show that typical update packet sizes are between 20 and 60 bytes [150, 34, 35]. The 1 EVE Online. https://www.eveonline.com/ 2 A similar decomposition has been proposed by Fan et al. specifically for peer-to-peer gaming [56]. They dis- tinguish between interest management, game event dissemination, NPC host allocation, game state persistence, cheating mitigation, and incentive mechanisms. 10 2. Background: Applications and Use Cases https://www.eveonline.com/ Position 3x4=12 bytes Direction 3x4=12 bytes Rotation 3x4=12 bytes ID/State 8 bytes UDP 8 bytes IPv4 20+ bytes TCP 20 bytes IPv4 20+ bytes Payload 44 bytes Payload 44 bytes TCP 20 bytes IPv6 40+ bytes Payload 44 bytes Payload: 44 bytes UDP+IPv4: ≥72 bytes TCP+IPv4: ≥84 bytes TCP+IPv6: ≥104 bytes Figure 2.1.: Illustration of network packet sizes for transmitting position updates. Only considering Layers 3 and 4 (network and transport), protocol header overhead is between 50% and more than 100% of the payload size. average throughput for the activity of a single participant in terms of bytes per second is therefore rather low. 50-byte messages at 10 Hz result in 500 bytes/s or 4 kbit/s net traffic, which has been confirmed by measurements [150, 151, 35]. This traffic profile is distinct from most other common Internet applications. Although the traffic seems low, Internet protocols induce a significant overhead. For example, consider a multiplayer game in which each player’s current position has to be regularly disseminated to all interested neighbors. The payload of such update is position data, e.g., a triple of floating point coordinates, plus a player ID and possibly some additional state. To minimize traffic, these high-frequency updates contain as little data as necessary. A typical payload can thus consist of just 20 bytes (e.g., 3×4 bytes coordinates plus 8 bytes player ID). Even a more sophisticated position update packet, as illustrated in Figure 2.1, has a size of just 44 bytes. Sending such an update over the Internet, even using the most lightweight transport protocol UDP, adds another 28 bytes (8 bytes UDP header, 20 bytes IPv4 header). This results in 48-byte and 72-byte packets on the link layer, respectively. Hence, if only a single update event is transmitted in a UDP packet, the size of network protocol headers can exceed the size of the actual payload—and this calculation ignores any protocol below the network layer. Using TCP (20 bytes header) instead of UDP, or IPv6 (40 bytes or more) instead of IPv4 makes the situation even worse: 20 bytes of application payload would be blown up to 80 bytes on the link layer. Protocol overhead therefore easily adds more than 100% overhead, i.e., more than doubles the necessary bandwidth. And this does not even include further overhead of Layer 2 and potential ad- ditional encapsulation techniques such as VPN. Prosad et al. [133] exemplify the Layer 2 overhead by showing that the capacity of a 10BaseT Ethernet link is only 7.24 Mb/s for 100-byte packets versus 9.75 Mb/s for 1500-byte packets. So far, we only considered events from one player. In the client/server case, this is what each client sends to the server. The server returns an aggregated view of all relevant events for the respective client, consisting of larger, but equally frequent packets. Measurements show around ten times higher server-to-client traffic, strongly depending on the game situation [150]. With larger packets, the relative overhead is smaller, which is why server-based approaches work reasonably well with today’s Internet infrastructure. In contrast, using peer-to-peer event dissemination to minimize latencies requires each client to send its updates to each interested peer individually, and vice versa each client receives updates 2.1. Online Gaming 11 from many individual peers. In this case, clients have to send each of their updates more than once, and the packet header overheads become highly significant. Miller and Crowcroft conducted a simulation study on this issue [114], in which they conclude that pure peer-to-peer MMOG messaging is not feasible with residential broadband. In their peer-to-peer model, peers subscribe for updates among each other based on a constant-size area of interest (AOI) and deliver updates directly. Intuitively, the upstream traffic of each peer induced with direct update delivery grows linearly with the fan-out, i.e., the number of subscribers. Inversely, the downstream traffic is linear with the fan-in. Since most residential Internet connections have asymmetric up- and downstream bandwidths, with a significantly lower upstream bandwidth, we will focus on the peers’ fan-out. Typical upstream bandwidths of residential connections range from 100 kbps to 10 Mbps. Using the 4 kbps per subscriber from the example above, peers could theoretically serve 25 up to 2,500 subscribers. In practice, however, when saturating the uplink, latencies will skyrocket due to queuing3, not even considering cross traffic. The simulation by Miller and Crowcroft [114] suggests that already an average bandwidth usage of less than 50% significantly increases transmission latencies. On the other hand, a 10 Mbps connection has a lot of spare capacity in such situations. In this thesis, the described packet overhead problem is addressed twofold. First, we strive for combining direct message delivery where possible with aggregation of update messages by dynamically using forwarder nodes. Those nodes receive updates from potentially multiple sources and can therefore forward aggregated packets. Second, we strive to be adaptive with respect to node heterogeneity. When powerful nodes participate, e.g., one with a 10 Mbps uplink, it is possible to shift load from weaker nodes. Opportunities: Clustering and Interest With the challenges described above, online gaming scenarios show important properties that we aim to exploit for improvement. The first is the clustering of interest among participants. Since interest is mostly based on virtual world proximity, it is typically highly clustered.4 This clustering has a direct effect on the number of options for aggregation and load shifting, because forwarding of events can be performed most efficiently using nodes with common interests. Secondly, the interests of participants in each other typically have gradations. This means that, while some neighbors in the virtual world are of high interest, e.g., because they are close by, others further away, might be visible, but of low interest. Studies from cognitive sciences indicate that the human brain can only focus on a small and constant number of objects simultaneously [157, 138]. The number of neighbors of highest priority is therefore very limited. Updates from the other neighbors are still needed for a reasonable awareness of the environment. Low-interest nodes, however, need lower update frequencies and tolerate higher latencies. Hence, the update dissemination can be optimized based on interest, therefore serving highly interested neighbors with high priority and less interested neighbors on a best-effort basis. 3 The effect of excessive delays caused by large buffer sizes throughout the Internet has gained attention under the term bufferbloat [68]. 4 We have measured clustering coefficients of around 60% in typical scenarios. Refer to Section 9.3.1 for more details. 12 2. Background: Applications and Use Cases We have chosen online games as the primary scenario not only because of their popularity, but also because they reflect a unique combination of the different properties and requirements described above. This allows us to study interdependencies between these properties, specifically with respect to latency sensitivity, dynamism, and heterogeneity of both interest and resources. Using the knowledge we have about properties and requirements of specific mobile game types, we can define benchmarks that serve as representative and reproducible standard tests [99, 103] to compare networking approaches. 2.2 Mobile Augmented Reality Gaming A subclass of the above discussed online games are online augmented reality (AR) games. Aug- mented reality games in general augment the real world with virtual objects, which shape the game play. The game therefore is embedded into the real world, giving a new perspective of gam- ing. This class of games has gained a lot of potential with the ubiquity of powerful enough mobile devices, most notably smartphones. Augmented reality technologies and applications, however, had been on the research agenda al- ready one and a half decades before the emergence of smartphones [11, 10]. Using special-purpose hardware, the use of augmented reality was explored in various application domains beyond enter- tainment, including manufacturing and maintenance, education, medical, and military. According to Azuma [11], an augmented reality system must have the following three characteristics: (i) combines real and virtual, (ii) is interactive in real time, and (iii) is registered in 3D. Azuma’s def- inition is limited in that it focuses on the computer graphics domain, where the term augmented reality relates to graphical augmentation, e.g., adding rendered 3D objects to live video recorded by a camera. A recent example from computer games that fits this definition is the Android-based game DroidShooting5. Putting less emphasis on the graphical representation, an important aspect of augmented reality is the mapping of the virtual space onto the physical space. Large scale, or massively multiplayer, augmented reality games often use the users’ GPS-based positions to place them on the virtual map. To ease navigation, the virtual map reflects the real world to a certain degree, usually by using a street map underlay (e.g., Google Maps6 or OpenStreetMap7) for the virtual world map. The most popular example of this class of games is Ingress8 (Figure 2.2a). TowerWorld [97] (Figure 2.2b) is a prototype of a similar game which was developed for the evaluation of mobile publish/subscribe solutions is this application class. Local Communication By mapping the virtual world onto the physical world, interest within the virtual world becomes directly related with physical proximity. Interest-based communication (e.g., position updates, player actions) is therefore highly localized with respect to the physical world. Figure 2.3 shows 5 DroidShooting. https://play.google.com/store/apps/details?id=jp.co.questcom.droidshooting 6 Google Maps. https://maps.google.com/ 7 OpenStreetMap. http://www.openstreetmap.org/ 8 Ingress. https://www.ingress.com/ 2.2. Mobile Augmented Reality Gaming 13 https://play.google.com/store/apps/details?id=jp.co.questcom.droidshooting https://maps.google.com/ http://www.openstreetmap.org/ https://www.ingress.com/ (a) Ingress (b) TowerWorld Figure 2.2.: Screen shots of two typical mobile augmented reality multiplayer games. Both use the physical location of the player as the in-game position and show the local street map as an underlay of the virtual world. that a large part of the communication traffic of an augmented reality game has a target of only a few dozens of meters away. This raises the question about keeping traffic local even more than for stationary network games. Yet, today’s mobile applications are largely cloud-based, meaning that the network sees only communication between the devices and cloud servers. The mobile devices’ cellular network connectivity is often the limiting—and most expensive— factor for online applications, particularly for interactive ones. Beyond the dependency on a rea- sonable infrastructure coverage, cellular communication is expensive in terms of energy usage because the connection needs to stay permanently in high power mode despite the low uti- lization [14, 125]. Furthermore, although low in throughput, the accumulated data volume of long-lasting sessions easily exceeds the volume of inexpensive data plans, jeopardizing the users’ desire for playing the game. In a test session, the prototype TowerWorld [97] used around 10 MB per 15 minutes. Users of Ingress, which is less interactive and more optimized than TowerWorld, report varying values of 200 to 600 MB per month for typical usage.9 Finally, cellular networks often induce higher latencies than wired broadband connections. Opportunity: Wireless Ad-Hoc Communication Given those facts, the use of opportunistic ad-hoc networks for a low-latency local communica- tion offload has a great potential [98]. Today’s basic technologies that qualify for such connectivity on consumer devices are Bluetooth and WiFi (IEEE 802.11). In the Bluetooth protocol stack, the Bluetooth Network Encapsulation Protocol (BNEP) [23] provides an Ethernet emulation, which 9 Based on user reports on Reddit. “How much data are you using? How active are you?” http://www.reddit.com/ r/Ingress/comments/18u5k7/how_much_data_are_you_using_how_active_are_you/. Accessed 2014-04-09. 14 2. Background: Applications and Use Cases http://www.reddit.com/r/Ingress/comments/18u5k7/how_much_data_are_you_using_how_active_are_you/ http://www.reddit.com/r/Ingress/comments/18u5k7/how_much_data_are_you_using_how_active_are_you/ 0 20 40 60 0 10 20 30 40 50 % d ir ec t co m m u n ic at io n d at a max. direct communication range in meters Data that could have been send directly as a function of the maximal direct communication range Figure 2.3.: Ratio of data that could have been sent directly depending on the direct communi- cation range, measured in a test session with the augmented reality game prototype TowerWorld [97]. Almost half of the data is sent to a destination less than 20 meters away from the source. can serve as the base for an IP stack. The need for explicit device pairing, however, limits Blue- tooth’s suitability for dynamic ad-hoc networks. IEEE 802.11 [79] specifies the Independent Basic Service Set (IBSS) that allows for an ad-hoc communication mode without a coordinating ac- cess point. Wi-Fi Direct [166] is an alternative WiFi ad-hoc connection standard that has been promoted by the industry lately. Unlike IBSS, it uses software access points on the participating devices and focuses on easy and secure connection setup. Yet, the dynamically allocated access points, make Wi-Fi Direct less flexible in group size and range. On the other hand, Wi-Fi Direct is available on many recent smartphones and tablets, while IBSS is rarely activated on consumer devices. Hence today, the perfect ad-hoc communication technology that is available on a large number of mobile devices is still lacking. Nevertheless, for this work, we assume the availability of an IEEE-802.11-IBSS-like wireless ad-hoc protocol on all mobile devices. Wireless multi-hop routing protocols, such as AODV [131] and OLSR [42], enable nodes to communicate with each other even if they are not in direct wireless transmission range. This works as long as there is a chain of intermediate nodes that are in each other’s range to forward the messages. Hence, given a sufficient device density, these protocols can extend the effective communication range by orders of magnitude. For augmented reality games, only a small number of hops can be sufficient (Figure 2.4). Forwarding nodes, however, have to bear additional load. The use of a simple application-level multicast mechanism, e.g., using direct overlay connections from source to destinations, can lead to a multiplication of forwarding traffic. If multiple source-destination pairs use a common forwarder, the forwarder has to carry the same information multiple times. Network-level broadcasts, on the other hand, might be too unbounded in larger ad-hoc networks. InterestCast can mitigate this problem by aggregating common data on common forwarding links. Provided that forwarder nodes are also participants of the InterestCast overlay, the mea- surement of overlay link properties (most importantly latency) gives hints about the forwarders in 2.2. Mobile Augmented Reality Gaming 15 0 20 40 60 0 1 2 3 4 % sa ve d d at a max. # of hops max. direct range 3 max. direct range 5 max. direct range 10 max. direct range 20 max. direct range 30 Data that could have been send over multiple hops as a function of the maximal hop count Figure 2.4.: Accumulated ratio of data that could have been sent over a number of hops depend- ing on the single-hop transmission range (meters), measured in a test session with the augmented reality game prototype TowerWorld [97]. A large amount of data requires only a small number of hops. the underlay. The forwarding function can then be pulled up into the overlay, where the necessary information for message aggregation is available. This allows for keeping a general-purpose ad- hoc routing protocol, while at the same time fitting the overlay routing structure to the underlying topology. Mobile games can benefit even more from localized communication than conventional multi- player games. Although not yet deployed on a large fraction of mobile devices, the basic necessary wireless ad-hoc connectivity and routing technology is available. InterestCast provides an overlay routing mechanism that adapts to the underlying topology. 2.3 Robotics and Vehicular Networks Multi-Robot Systems Multi-robot systems [110] have become an important research field in the robotics domain since the early 2000s. Multi-robot systems can be seen as a subclass of multi-agent systems [59], which, as a research discipline, has relations to distributed systems, artificial intelligence, human- computer interaction, and ubiquitous computing. Most practical multi-robot systems considered in research consist of only a small number of robots [17], and most work considering coordina- tion [57] deals with the aspects of interaction languages, semantics, and higher-level coordination tasks. Basic communication for the low-level real-time coordination, however, is nevertheless needed. Deployments of self-sustaining autonomous robots, e.g., for rescue missions [90], typically use mo- bile, and possibly ad-hoc communication to be infrastructure-independent. Particularly in rescue missions, the projected number of independent agents is larger (100 or more), control is mostly distributed, and real-time planning is in the order of seconds [90]. Furthermore, robot teams 16 2. Background: Applications and Use Cases moving in a real-world space need the concept of positioning and proximity, and coordination is most time-critical where agents are close to each other or form a group. Finally, agent behavior and coordination should be adaptive to their environment. For example, simple situations might be covered with a fast local coordination, while complex situations require tighter interaction and more planning [85]. With the adaptation of planning schemes, communication requirements, such as delay bounds, may also change. With these properties, the concept of InterestCast appears to be a good fit to be used as the low-level distributed status and coordination message exchange mechanism. Vehicular Networks Similarly to multi-robot networks, vehicular networks [119, 127] are built among physical, mo- bile entities—in this case cars and trucks. Due to their participants’ mobility, they emerged as a variation of Mobile Ad-hoc Networks (MANETs), named Vehicular Ad-hoc Networks (VANETs). Besides vehicle-to-vehicle (V2V) communication, there is vehicle-to-infrastructure (V2I) commu- nication, which uses cellular or WiFi infrastructure. Among the most prominent applications of vehicular networks are safety and traffic optimization applications such as accident, congestion, and road condition warnings [162], which are propagated locally in real-time. More recently, entertainment applications, including distributed gaming gained increasing interest [127]. The immediately critical but rare communication, e.g., accident alerts, will most likely use lower layer forwarding mechanisms to achieve the best reactivity and reliability without any trade-offs. On the other hand, regular monitoring and control information dissemination, as well as mobile entertainment applications, such as mobile gaming discussed in the previous section, could benefit from a system like InterestCast. Those non-safety-critical applications require trade-offs in the re- source usage between applications with different communication requirements, e.g., with respect to bandwidth and latency. Local traffic safety monitoring, for instance, uses periodic messages containing vehicle speed, position, and direction [4]. In addition, event-driven messages are sent once unsafe situations are detected. Especially vehicle data like position and speed is most rele- vant for other close-by vehicles, while more distant vehicles in the same traffic flow might still be interested, but timeliness is less critical. Furthermore, information from vehicles ahead in traffic is more informative than from following vehicles. Finally, groups of vehicles moving in the same flow have the highest interest in each other’s information and therefore form clusters. Those interest distributions are similar to those discussed in the context of virtual game worlds above. 2.4 Air Traffic Control A timely and scalable event dissemination is a core requirement also for modern aeronautical control systems [70]. Much data is produced and consumed on a spatio-temporal basis, i.e., with respect to location and time. Trajectories can be modeled using 4D coordinates (longi- tude, latitude, altitude, time) [156]. For example, up-to-date airplane positioning information allows making route planning more dynamic to meet near-term needs. Information about detailed weather conditions, which are scarce across oceans, can be observed in real-time by airplanes and disseminated to following airplanes. 2.4. Air Traffic Control 17 Much of this data might be of little immediate interest to central control authorities and particu- larly to other air traffic participants that are not in close range to the reference location, supporting the case for localized dissemination. Generally, air traffic control currently aims for a more decen- tralized coordination, for which a decentralized event dissemination system could be an important part. 2.5 Requirements and Challenges From the targeted applications and use cases detailed above, we identify the following set of requirements and challenges. They are used as a basis for the selection of potential approaches and design decisions, which are discussed in Chapter 4 of this thesis. Latency sensitivity. The targeted applications are sensitive to latency. In many cases, this means that the goal is to minimize delivery latency. When disseminating events to multiple re- ceivers, minimizing can refer to the average latency but also the maximum latency among the receivers. Depending on the particular application, there might be fixed upper bounds for event delivery. For particular reasons, such as fairness among participants, it might also be desirable to minimize the latency variance among receivers. Best-effort adaptation. There are physical bounds on the latency and throughput the underlying network can provide. Since typical network infrastructures, such as the Internet, do not give latency guarantees or predictions, overlays can only provide best-effort guarantees. The dissemination overlay should therefore (1) be prepared for different—and possibly changing—application needs and (2) adapt to changing conditions of the underlying net- work to provide the best possible performance in all situations. Many-to-many communication. Every participant is an event producer and has an individual set of interested participants. This many-to-many communication scheme generally does not lend itself for an efficient clustering into a fixed set of multicast groups or channels. Instead, each participant needs a separate dissemination structure. Many scenarios, however, show a strong clustering in the participants’ interests, induced by the interest locality. This fact can be used to facilitate the maintenance of the individual dissemination structures. Dynamics. Locations and interests of participants are highly dynamic. In particular, the interest sets can change constantly. The reorganization of dissemination structures should thus generate little overhead. Rebuilding the dissemination structure from scratch on each change is not an option. Fur- thermore, a quick fallback mechanism should compensate failing nodes or connections. Packet overhead. Update events are often small, because they either contain incremental updates or very limited state (such as coordinates of a position). This results in high relative over- heads from packet headers of the different network layers. A low ratio between actual pay- load and the gross traffic unnecessarily reduces achievable node fan-out. 18 2. Background: Applications and Use Cases The dissemination system should therefore aggregate packets where possible to maximize efficiency. Interest gradations. Interest among participants often has gradations. While some (e.g., close players, close friends) are of high priority, delay or message loss from others might be toler- able. The interface should therefore provide means for specifying the interest priority to be able to tune event delivery to meet the needs of the highest-priority participants first. Scalability. The total number of nodes in the network can be very large, while only relatively small subsets are of interest for a certain participant. The algorithmic complexity should therefore only depend on the number of interested participants, not on the total system size. Some participants may be of particularly high interest, causing heterogeneous fan-outs. For instance, in online games, areas with a high player density require many connections between the players. This can easily exhaust the available connection bandwidth of single nodes. Resource heterogeneity. The participants’ resources are heterogeneous. While some nodes, e.g., those on mobile networks, may have extremely constrained connection bandwidths, other nodes have a lot of spare capacity. Ideally, all participants should get the same service quality, independent of their connection and location. Infrastructure. Dedicated infrastructure (e.g., multicast servers) might or might not be available. The system should, therefore, be able to work independently of dedicated servers. However, if such options are available, it should be possible to leverage them. 2.5. Requirements and Challenges 19 3 State of the Art In this chapter, we discuss and categorize important existing work with respect to event dissem- ination and the application-specific challenges identified in the previous chapter. We start with the conceptual high-level view, exploring approaches how applications can interface with the dis- semination infrastructure. Subsequently, we dive deeper into the lower-level message multicast implementations and further provide an overview of publish/subscribe systems. Finally, we move towards more specialized systems and look into the aspect of interest management and integrated approaches for interest management and event dissemination. 3.1 Interfacing Event Dissemination Although the concept of delivering events in the form of messages to a multitude of receivers is intuitively comprehensible, a closer investigation of possible options and features poses a multi- tude of questions regarding design decisions. Early approaches were simple multicast solutions with multicast groups, such as IP multicast [48]. Messages could be sent to a group so that all registered group members receive it. When only unicast is available, the most primitive solution is obviously to send the message to each receiver individually. This requires each sender to know all receivers, but on the other hand allows an individual selection of receiver groups for each mes- sage. Such approach, however, limits scalability with respect to the receiver set size. Scalability and implementation aspects are further discussed in Section 3.2. Here, we want to concentrate on what information the application needs to manage and what can be hidden behind the event dissemination facade. A basic distinction criterion with respect to the interface is the way the receivers for a particular message are selected. The selection can be sender-initiated (i.e., the sender has the information who shall receive the message), receiver-initiated (receivers join a certain sender, channel, or group), or implicit and based on the content of a message. Simple unicast-based dissemination belongs to the first category; group-based multicast solutions (e.g., IP multicast) belong to the second. The third category is largely covered by a variety of publish/subscribe systems. 3.2 Multicast When it comes to the simple but efficient dissemination of data to a set of receivers, multicast proto- cols are the means of choice. In contrast to the unselective broadcast (e.g., [65]), multicast uses an explicit selection of the set of desired receivers. By running multiple protocol instances, in principle any number of multicast groups can be created. This allows for a basic channel-based publish/sub- scribe implementation with little effort, but omits some decoupling of the participants, especially with respect to time (cf. Section 3.4), because messages are not stored for offline participants. 21 There is a plethora of protocols, especially in the area of application layer overlays, aiming to solve multicast efficiently. Many of them have been analyzed for different application do- mains [168, 96, 66]. This section therefore provides only a brief overview, focusing on the issues relevant for this work. 3.2.1 IP Multicast IP multicast [48] is an extension to the Internet Protocol that allows sending IP datagrams to multicast groups. Datagram delivery is defined as best-effort, analogous to IP unicast. Membership in multicast groups is dynamic in that hosts can join and leave groups at any time. Multicast groups (named host groups) are identified with IP addresses from a specific range (224.0.0.0 to 239.255.255.255). Sending a datagram to a multicast address delivers the datagram to all hosts is the corresponding group. To do so, the sender does not even need to be member of the group. This way, IP multicast can work as both single-source and multi-source multicast. Host groups are not limited in size, but, since they are addressed using IP addresses, their number is limited. The most important practical limitation, however, is that IP multicast has never been widely deployed on the Internet. Without such, it is of little use for large-scale Internet applications. The main reasons are considered to be limitations in access control, security, address allocation, and network management [51]. It is also argued that the end-to-end principle [140], stating that application-specific functions should be implemented on the end hosts rather than in the network, prevails over the efficiency benefit of IP multicast [41]. 3.2.2 Application-Layer Multicast Due to the limitations of IP multicast, a lot of research focus has been put on application-layer mul- ticast (ALM), on which we concentrate in the following. ALM only employs the end systems for data multiplication, therefore it is sometimes called end system multicast. An advantage of ALM over IP multicast is its greater flexibility. While IP multicast has to concentrate on core features to keep in-network complexity low, ALM can incorporate more sophisticated, and therefore more complex, approaches. These include application-specific extensions and optimizations, which ex- plains the large number of variations. In the following, we first give an overview of general-purpose application-layer multicast solutions, before looking more specifically at topology optimization and delay minimization. Overcast [83] is an early approach for replacing IP multicast. It builds overlay trees, focusing on organizing nodes to build high-bandwidth channels. Fast joining of nodes is an explicit goal, but latency is explicitly not an optimization target. Joining nodes start at the root of the distribution tree and try to move downwards towards the leaves as long as they do not observe bandwidth degradation. This leads to particularly deep trees and therefore high maximum latencies. The Banana Tree Protocol (BTP) [77] provides a multicast service with best-effort delivery, on top of which a group communication as well as a file sharing protocol are defined. Nodes can switch their positions in the tree to optimize the metrics latency, degree, and total tree cost. The optimization process is performed in a distributed fashion by switching siblings. The authors, 22 3. State of the Art however, do not explain based on which information the selection of options is performed. Al- though the basic algorithm takes latency into account, the further analysis is only done with cost minimization in mind. ALMI [130] targets small multicast groups and employs a central session controller for node registration and building the spanning tree. The session controller communicates directly with all members using unicast messages to propagate tree updates. Each pair of nodes regularly sends pings to each other to measure the link delay. To reduce the O(n2)measurement cost necessary for a full mesh, nodes use a fixed degree and regularly replace known bad edges to converge to the optimum mesh. Chu et al. present Narada [41, 40], an End System Multicast protocol, explicitly designed as an IP multicast replacement. Narada maintains an overlay mesh among the participating nodes on top of which multicast spanning trees are constructed. The mesh is continuously updated to optimize for latency by adding links where latency is significantly reduced and by removing links that are least utilized. Spanning trees are constructed using a distance vector protocol similar to DVMRP [49]. The authors suggest application-specific customizations with respect to bandwidth and latency prioritization, but only really consider one scheme. Available bandwidth is discretized into a few classes. Links are first selected based on the bandwidth class, and if there are several options left, latency is minimized. Hence, the authors explicitly address the latency vs. bandwidth issue, but always prioritize bandwidth over latency. NICE [15] targets low-bandwidth applications with large receiver sets and focuses on constrain- ing control overhead, node degree, and latency stretch1. Nodes are organized into a hierarchy of layers, in which they are grouped in clusters. Clusters are based on inter-node latencies to minimize forwarding latency stretch. NICE differentiates between the distribution topology, which must be loop-free, and the control topology, which is denser. Nodes join by connecting to the high- est layer in the hierarchy and subsequently finding close-by neighbors in one of the lower-layers clusters. Clusters have a minimum and a maximum size. To stay within these bounds, they are merged and split respectively. Although considering latency for the optimization, average path lengths beyond 20 hops with 128 nodes are hardly suitable for latency-critical applications. OMNI [16] is designed for real-time applications, such as media streaming. It tries to mini- mize latency with bounded out-degrees, but it assumes dedicated Mulicast Service Nodes (MSN) deployed by service providers. The authors first present linear programming solutions to the NP- hard minimum average-latency and minimum maximum-latency degree-bounded spanning tree problems. The spanning tree only refers to the MSN topology. They then present a decentralized heuristic algorithm to build a spanning tree among the MSNs that converges to the optimum. The algorithm is adaptive with respect to the network latencies and the number of clients at each MSN, but the MSNs themselves are assumed to be mostly static and stable. SplitStream [32] is a multicast system building on top of Scribe [33]. It builds a forest of multiple trees, each transporting a stripe of the data, so that nodes can decode the data even if a fraction of the stripes is missing. Furthermore, all nodes are both leaves and inner nodes of the different trees, leading to a good load fairness. This system therefore aims at heavy traffic data streams, 1 Latency stretch is defined as the ratio between the latency accumulated over multiple hops from source to desti- nation and the direct latency between source and destination. 3.2. Multicast 23 such as media content. The trees are built and maintained as separate Scribe trees. To deal with nodes whose out degree is exceeded, SplitStream adds a child rejection mechanism. For children that find no new parent, SplitStream maintains a spare capacity group of nodes that can handle additional children. A similar approach is taken by Bullet [94]. It also splits the data into disjoint encoded blocks, which are distributed via a mesh over an overlay tree. The overlay tree is constructed and main- tained by any of the existing algorithms. The delivery through the mesh is probabilistic, but the block encoding scheme (e.g., using erasure codes) allows reconstructing the data even if a few blocks are missing. Disjoint content is located using the RanSub algorithm [93]. Like SplitStream, Bullet aims for high bandwidth traffic, and delivery latency is not a primary concern. Chainsaw [128] builds on an unstructured topology and completely replaces trees with a mesh. Based on a random mesh topology, peers get notified upon arrival of new sequence-numbered packets and request ranges of sequence numbers. Unlike the previous systems, Chainsaw thus uses a pull-based approach, which is similar to the BitTorrent distribution strategy. Obviously, this concept is rather suitable for bulk data streaming than for low-latency control information dissemination. Most of the presented approaches use node degree or bandwidth as the primary factor determin- ing the dissemination structure. Many optimize for high throughput and assume a single source. Latency is also considered, but the average number of hops is usually too high for sub-second end-to-end latencies. They further provide no prioritization, but rather rely on a simple multicast group interface. Further, an adaptation to application needs is barely considered. A comparison of the discussed approaches based on their properties is provided in Table 3.1. 3.3 Delay Optimization in Multicast Although multicast and in particular application layer multicast has been a research topic for more than two decades, only recently there have been new approaches dedicated to the analysis and optimization of the delay in message dissemination. An important novelty is the explicit consider- ation and modeling of queuing delay on each node, caused by bandwidth limitation. When a node that forwards a message to several neighbors “simultaneously”, the last of the messages may have to wait for a significant amount of time to be actually transmitted, depending on fan-out, message size, and bandwidth availability. To exemplify, a 1 kB packet takes about 12 ms to be transmitted on a 1 Mbps line. With a fan-out of 10, the last packet has a delay of 100 ms before even being put on the line. The fan-out but also the message transmission order therefore have a significant im- pact on the individual message delays, which has been largely ignored in the previously described works. Mokhtarian and Jacobsen [116, 117] propose algorithms that optimize the delay of multicast trees. In their model, they include the time messages are delayed on the nodes before getting sent out. The authors consider the two problems of minimizing average and maximum latency. They show that the problems are NP-hard and propose heuristic approaches to solve them efficiently. Their first paper [116] only covers constructing whole trees, incremental updates are added in 24 3. State of the Art A pp ro ac h Si n gl e- vs . m u lt i- so u rc e Tr ee co n st ru ct io n C on si de re d n od e/ n et w or k fe at u re s D el iv er y A da pt at io n (e n vi ro n m en t) A da pt at io n (a pp li ca ti on ) Fa il u re fa ll -b ac k O ve rc as t [8 3] si ng le re ce iv er no de ba nd w id th de te rm in is ti c re ce iv er -b as ed re lo ca ti on no re ce iv er -b as ed re lo ca ti on B TP [7 7] m ul ti di st ri bu te d no de de gr ee , no de -t o- no de la te nc y, tr ee co st de te rm in is ti c si bl in g sw it ch in g po ss ib ly re -c on ne ct to ro ot A LM I [1 30 ] m ul ti ce nt ra l( se ss io n co nt ro lle r) no de -t o- no de la te nc y de te rm in is ti c ce nt ra l re co m pu ta ti on no ce nt ra lr ec om pu ta ti on , m em be r- or co nt ro lle r- in it ia te d N IC E [1 5] si ng le di st ri bu te d, cl us te r- hi er ar ch y no de -t o- no de la te nc y, no de de gr ee de te rm in is ti c cl us te r- ba se d re fin em en t no cl us te r- ba se d O M N I [1 6] si ng le di st ri bu te d (a m on g M SN s) la te nc y, no de de gr ee de te rm in is ti c di st ri bu te d (a m on g M SN s) no no ne / m ov e on e ch ild up th e tr ee Sc ri be [3 3] m ul ti (s in gl e re nd ez vo us ) D H T- ba se d (P as tr y) Pa st ry lo ca lit y (l at en cy ) de te rm in is ti c di st ri bu te d, Pa st ry lo ca lit y (l at en cy ) no D H T- ba se d (P as tr y) Sp lit St re am [3 2] si ng le m ul ti pl e Sc ri be tr ee s Pa st ry lo ca lit y (l at en cy ), no de de gr ee de te rm in is ti c no t be yo nd Pa st ry no Sc ri be ,c od in g fa ul t to le ra nc e B ul le t [9 4] si ng le m es h on tr ee nu m be r of se nd er s/ re ce iv er s pr ob ab ili st ic di st ri bu te d pe ri od ic re -e va lu as io n no no sp ec ifi c, co di ng fa ul t to le ra nc e C ha in sa w [1 28 ] si ng le ra nd om m es h ba nd w id th de te rm in is ti c no no m ul ti -s ou rc e pu ll N ar ad a [4 1] si ng le di st an ce ve ct or sp an ni ng tr ee on m es h ba nd w id th ,l at en cy de te rm in is ti c co nt in uo us m es h op ti m iz at io n va ri at io n of la te nc y- ba nd w . tr ad eo ff ne ig hb or fa ilu re de te ct io n, pa rt it io n de te ct io n & re pa ir M ok ht ar ia n 20 13 /1 4 [1 16 ,1 17 ] si ng le so ur ce no de ba nd w id th , in te r- no de de la y de te rm in is ti c no m in -a vg -l at vs . m in -m ax -l at so ur ce -b as ed Li 20 13 [1 08 ] m ul ti ce nt ra lw it h di st ri b. he ur is ti c up da te no de ba nd w . (p ac ke ts ), in te r- no de de la y de te rm in is ti c di st ri bu te d re fin em en t no po ss ib ly di st ri bu te d St re am lin e [1 13 ] si ng le so ur ce no de ba nd w id th de te rm in is ti c tr ee re ca lc ul at io n on ch an ge s no ne w pa re nt se le ct io n w it h gl ob al kn ow le dg e Ta bl e 3. 1. :C om pa ris on of a se le ct io n of ap pl ic at io n la ye rm ul tic as ta lg or ith m s 3.3. Delay Optimization in Multicast 25 their follow-up version [117]. The trees are calculated at the source nodes and encoded in the data packets, so that intermediate nodes do not need to keep state. Additionally, this gives the source node the best possible flexibility in selecting the tree structure. While in certain cases the path information can be encoded economically using bloom filters, other cases require an overhead of up to O(n) with n being the number of nodes in the tree, which is significant. For small payloads, this option seems infeasible. Furthermore, the information each node needs to exchange with others to run the optimization algorithm is O(DLN), where N is the total number of nodes, L is the pairwise shortest path hop count, and D is the node degree. Especially the linear dependency on the network size N is critical for large networks. For small receiver sets compared to the network size, there might be an option to prune the information without overly restricting the optimization process. Finally, all intermediate node or link failures must be quickly reported to the source node because only the source node is able to rebuild the tree. Other intermediate nodes that are first notified about the failure are therefore unable to react independently. A very similar investigation was performed by Li et al. [108]. They also argue that the delay incurred on the nodes due to queuing is a significant source of latency and must be included in the optimization model. The authors distinguish between transmission and waiting time and model waiting time using queuing theory. They define two problems to be solved incrementally. The first is to build a multicast tree for each peer so that all receivers can be reached and total (i.e., aver- age) delay is minimized. The second is to find multicast trees that reach all receivers and satisfy node bandwidth constraints. NP-completeness is proven for both problems. The main algorithm is designed to run on a central server, which needs to know all nodes’ capacities and latencies and which disseminates the results to the nodes. The authors do not explain what information each node needs to perform the forwarding and they do not analyze the necessary communication overhead. To add or remove nodes from an existing multi-tree, the authors propose distributed operations that can be initiated by all nodes. The same holds for a refinement operation that is supposed to react on environment changes. It remains, however, unclear what exact information each single node needs to perform these operations and how the information is exchanged. Fur- thermore, distributed operations and updates are barely evaluated. In a follow-up paper [109], Li et al. present a genetic algorithm for the central multi-tree computation. 3.4 Publish/Subscribe A main motivation for the publish/subscribe paradigm is the decoupling of sender and receiver, as pointed out by Eugster et al. [54]. In its core concept, participants that are interested in a par- ticular kind of information can subscribe to receive corresponding event notifications. The senders (publishers) do not need to know their receivers; this is transparently handled by the publish/sub- scribe system, often implemented as a publish/subscribe middleware. The middleware might also be able to store events for subscribers that are offline at the time a message is published. This con- cept decouples senders (publishers) and receivers (subscribers) in space and time. Furthermore, publishers can pass events asynchronously to the middleware and therefore do not need to handle or care about the delivery process (synchronization decoupling [54]). 26 3. State of the Art Publish/subscribe systems are commonly distinguished by the way they express interest in sub- scriptions. The most basic option is channel-based publish/subscribe, providing fixed channels in which events are placed and which can be subscribed. Conceptually, this is similar to group-based multicast. A slight extension of channel-based publish/subscribe is topic-based publish/subscribe, which allows additional filters on message header fields. Subject-based publish/subscribe pro- vides a hierarchically structured name space and allows subscriptions on different levels on the hierarchy. For example, the hierarchy “news/sports” can be subscribed as “news/sports”, or less selectively as “news”, which could also include events on “news/business”. The most expressive scheme is content-based publish/subscribe. It allows subscriptions with a logic term of predicates based on event properties, such as “type = ‘sensor’ AND temperature > 30”. Finally, special pur- pose publish/subscribe types provide subscription criteria tailored to certain kinds of applications. A prominent example is the class of spatial publish/subscribe systems, which is implemented by many of the virtual reality overlays presented in Section 3.6. The degree of expressiveness has an impact on the performance and/or the cost of the system. While channel associations can be matched with little effort, especially if the number of channels is limited, content-based publish/subscribe requires matching each event against each subscription individually in the extreme case. Specialized types (e.g, spatial publish/subscribe) have the poten- tial to be more efficient because they reduce the subscription space while at the same time fitting the application needs, but on the other hand are limited to a smaller number of applications. Traditional distributed publish/subscribe systems use broker-based architectures. All partici- pants connect as clients to one of possibly multiple brokers, which distributes the client’s publi- cations and notifies the client based on its subscriptions. If the system consists of more than one broker, the brokers build a network, forming an overlay topology. Essentially, a publish/subscribe system fulfills two main purposes: filtering and multiplication of events. Event filtering makes sure that clients only receive the events that match their subscriptions and therefore minimizes network traffic and client load. Event multiplication is necessary because potentially multiple clients have to be notified of a single publication. In broker-based systems, these two tasks are handled by the brokers, which are running on reliable and well-connected servers. There are several commercial implementation