Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

Jaiswal, Ashok Kumar (2014)
Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication

Preview

Text
Dissertation_Ashok_Jaiswal_30-07-2014.pdf
Copyright Information: CC BY-NC-ND 2.5 Generic - Creative Commons, Attribution, NonCommercial, NoDerivs .
Download (4MB) | Preview

Item Type:

Ph.D. Thesis

Type of entry:

Primary publication

Title:

Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

Language:

English

Referees:

Hofmann, Prof. Dr. Klaus ; Herkersdorf, Prof. Dr. Andreas ; Küppers, Prof. Dr. Franko ; Steinmetz, Prof. Dr. Ralf ; Hochberger, Prof. Dr. Christian

Date:

16 July 2014

Place of Publication:

Darmstadt

Date of oral examination:

14 July 2016

Abstract:

Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s.

The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities.

In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system.

Alternative Abstract:

Alternative Abstract

Language

Die Forderung nach permanenter Erreichbarkeit der Nutzer setzt eine ununterbrochene Anbindung an das Internet und somit an zentrale Server voraus. Mit der exponentiellen Zunahme von mobilen Endgeräten wie Smartphones, Tablets und Laptops wird der Datentransfer zum Jahr 2018 voraussichtlich die Exabyte-Schwelle überschreiten. Zusätzlich werden Anwendungen wie Videostreaming, Video-on-Demand, Online-Gaming und Soziale Netzwerke das Datenvolumen weiter erhöhen. Zukünftige Anwendungsszenarien wie Smart Cities, das Internet der Dinge, Industrie 4.0 und Machine-to-machine (M2M) Kommunikation stellen darüber hinaus höchste Anforderungen an die Kommunikationsinfrastruktur, wie z.B. hohe Datenraten bei gleichzeitig niedriger Leistungsaufnahme. Wissenschaftliche Untersuchungen wie die Erforschung desWeltalls sowie die Ölförderung werden im Jahr 2018 voraussichtlich Rechengeschwindigkeiten im Exaflops/s-Bereich benötigen, was einen Datendurchsatz pro Speicherschnittstelle im TB/s-Bereich erfordert. Um einen solchen Datendurchsatz zu erreichen, m¨ ussen die I/O-Link-Geschwindigkeiten zwischen zwei Geräten in den GB/s-Bereich erhöht werden.

Bei solch hohen Datenraten können Informationen sowohl über komplexe, serielle Clock-Data-Recovery (CDR) als auch über einfachere, parallele Quellen-synchrone Verbindungen übertragen werden. Obwohl CDR im Vergleich zur Quellen-synchronen Alternative effizienter ist, sind für das Erreichen einer TB/s-Datenrate mehrere serielle Verbindungen notwendig. Abgesehen davon kann die parallele Quellen-synchrone Übertragung hinsichtlich Leistungsaufnahme und Silizium-Flächenbedarf Vorteile für sich verbuchen, da zusätzliche I/Os keine weiteren Hardwareressourcen erfordern. Bei hohen Datenraten treten bei Quellen-synchronen Verbindungen jedoch Probleme wie Rauschen der Versorgungsspannung, Übersprechen, Inter-Symbol-Interferenzen (ISI) usw. auf, die Laufzeitunterschiede zur Folge haben. Um diesen Problemen zu entgegnen, kann die Methode des adaptiven Trainings im Zeitbereich angewandt werden, um weiterhin mit höchsten Datenraten zu kommunizieren.

In dieser Dissertation werden zwei neue Architekturen für adaptive Quellen-synchrone Verbindungen vorgestellt, die gegenüber bisherigen Implementierungen signifikante Vorteile bezüglich Leistungsverbrauch und Siliziumfläche aufweisen. Die erste Architektur basiert auf einer Verzögerungseinheit, die inkrementell kleinste Verzögerungen zur Phase eines Takts addiert. Eine zweite Architektur basiert auf einem in der PLL (Phase Locked Loop) integrierten Phaseninterpolator (PI). Dieser kann kleinste Verzögerungen zur Phase eines Takts sowohl hinzufügen als auch subtrahieren. Dadurch kann die Synchronisation auf Kosten höherer Komplexität schneller erreicht werden.Gerade bei Double Data Rate (DDR) Systemen, die für Quellen-synchrone Systeme mit hohen Datenraten ¨ ublicherweise eingesetzt werden, reduzieren auch ein nicht-optimaler Taktbaum sowie ein unausbalianciertes Tastverhältnis die Timingmarge. Um dem entgegenzuwirken, werden in dieser Dissertation auch ein neuartiger Algorithmus zur Taktbuffergenerierung sowie ein neuartiges Tastverh¨altnis-Korrekturglied vorgestellt. Dadurch kann auch eine Reduzierung der Leistungsaufnahme Quellen-synchroner Systeme erreicht werden.

German

Uncontrolled Keywords:

Source-synchronous; I/O Links; Adaptive Interface Training; High Bandwidth Applications

URN:

urn:nbn:de:tuda-tuprints-40791

Classification DDC:

000 Generalities, computers, information > 004 Computer science

Divisions:

18 Department of Electrical Engineering and Information Technology

Date Deposited:

01 Aug 2014 12:10

Last Modified: