Genome replication progression and epigenetic regulation in mammalian cells Sunil Kumar Pradhan Genome Replication Progression and Epigenetic Regulation in Mammalian Cells Fortschreiten der Genomreplikation und epigenetische Regulierung in Säugetierzellen Vom Fachbereich Biologie der Technischen Universität Darmstadt zur Erlangung des akademischen Grades Doctor rerum naturalium genehmigte Dissertation von Sunil Kumar Pradhan Master of Science (Research) in Biological Sciences 1. Referentin: Prof. Dr. M. Cristina Cardoso 2. Referent: Prof. Dr. Heinrich Leonhardt Darmstadt, Technische Universität Darmstadt 2025 1 Genome Replication Progression and Epigenetic Regulation in Mammalian Cells Fortschreiten der Genomreplikation und epigenetische Regulierung in Säugetierzellen © 2025 by Sunil Kumar Pradhan is licensed under Creative Commons Attribution-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/ Jahr der Veröffentlichung der Dissertation auf TUprints: 2025 Tag der Einreichung: 09.12.2024 Tag der mündlichen Prüfung: 31.01.2025 2 “तस्मादसक्तः सतत ंकाय� कम� समाचर।” “Tasmad asaktah satatam karyam karma samachara” -Bhagavad Gita 3 Preface Welcome to my doctoral thesis, “Genome Replication Progression and Epigenetic Regulation in Mammalian Cells.” This work explores the intricate processes of genome replication and the role of epigenetics in human and mouse cells across different stages of development. A general introduction is followed by a section on common methods. Then, the thesis is structured into three chapters, each delving into various aspects of the aforementioned subjects. For clarity and coherence, each chapter presents the introduction, results alongside a comprehensive discussion/perspectives allowing for a more integrated and fluid understanding of the findings. Chapter 1 extensively builds upon the research presented in Pradhan et al. (2024), while some methodologies detailed in this thesis draw both from Pradhan et al. (2023) and Pradhan et al. (2024). The thesis aims to deepen our understanding of how genome replication and epigenetic mechanisms interact within mammalian cells. Part of the data are made available here:tudatalib. The rest of the data would be made public upon preparation of publication in due course of time. 4 Summary Among all mammalian cellular organelles, the nucleus gets special attention as it contains the genetic information of the cell, the DNA. This is also the site where all DNA-dependent functions, such as genome duplication prior to cell division, DNA transcription, and damage repair upon various DNA breakages undergo. The nucleus also contains the subcompartment, an assembly site for the ribosome, the nucleolus. Among all the DNA-templated events, genome duplication is the most eventful, involving a large array of proteins simultaneously unwinding and synthesizing a new pair of DNA at thousands of locations of the same long polymer, yet tightly regulated in time and space. It is an event on a scale hitherto undreamt of. While the faithful reconstitution of cellular identity involves the accurate inheritance of epigenetics, the information that sits atop the genetic information, the dynamic nature of epigenetics also influences the chromatin structure and regulates various aspects of DNA-templated events, including genome replication programs. The progression of development also triggers changes in the epigenetic landscape, thereby influencing the replication program. During genome replication, the chromatin decondenses, the DNA unwinds, and new DNA is synthesized and rewrapped around the histone core. The spatio-temporal regulation of the replication program also ensures the faithful inheritance of the epigenetic nature, specifically the histone post-translational modifications. However, the intricate interplay between epigenetics and the genome replication program remains a mystery, awaiting further exploration. The present work first sheds light on the developmental changes in genome replication programs, features of replication forks, and underlying chromatin dynamics in human cells. It also uncovered the change in replication time of less-studied tandem and interspersed repeats, which constitute a large chunk of the genome, using repli-FISH. As a prerequisite, a novel approach to analyze the repli-FISH was developed. A switch in the replication timing of ribosomal DNA and differential replication program of Alu, LINE1, and Centromere were observed. Interactions between genome replication machinery and RNA polymerase I responsible for rDNA transcription were observed in pluripotent stem cells. Overall, the study also complements and expands our understanding of the developmentally regulated genome replication program in human cells. High-throughput image analysis was used to quantify the histone modification marks in different cell cycle phases to characterize the relationship between genome replication and epigenetics. Furthermore, the statistical tool nucim was used to map the dynamic localization of histone marks to different compaction classes with cell cycle and sub-S phase progression in pluripotent stem cells. Dynamic localization in constitutive heterochromatin and its replication timing regulation by H3K36me3 was unveiled. Furthermore, genome-wide dynamics of H3K27me3 and 5 H3K4me3, which also mark the poised/bivalent chromatin in embryonic stem cells, unveiled a peculiar pattern. Using synthetic biology, the replication program was disrupted by localizing the constitutive heterochromatin next to nuclear lamin, and its effect on epigenetics was studied. Furthermore, the earliest replication origins and their (epi)genetic/chromatin characteristics are explored to validate the Domino model of replication progression. The approach to finding the cells with the earliest origins is developed in living and fixed cells was established for a detailed exploration. The ribosomal DNA tandem repeat was discovered as one of the preferred locations to start the replication across mammalian cell lines/species. To investigate the genetic nature of these origins, methods to identify, collect, and amplify single-cell genomes linearly were established to measure the micro-copy number gain. Finally, a model was presented describing the genome replication progression and interaction of epigenetic and replication programs. 6 Zusammenfassung Unter allen zellulären Organellen von Säugetieren erhält der Zellkern besondere Aufmerksamkeit, da er die genetische Information der Zelle, die DNA, enthält. Er ist auch der Ort, an dem alle DNA-abhängigen Prozesse stattfinden, wie die Genomduplikation vor der Zellteilung, die Transkription der DNA und die Reparatur von DNA-Schäden durch verschiedene Brüche. Der Zellkern enthält zudem ein Subkompartiment, den Nukleolus, der als Assemblierungsort für Ribosomen dient. Unter den DNA-abhängigen Prozessen ist die Genomduplikation die ereignisreichste: Sie umfasst eine Vielzahl von Proteinen, die gleichzeitig die DNA an Tausenden von Stellen eines langen Polymers entwirren und eine neue DNA-Doppelhelix synthetisieren – ein zeitlich und räumlich streng reguliertes Ereignis von bisher unvorstellbarem Ausmaß. Während die getreue Wiederherstellung der zellulären Identität die präzise Vererbung epigenetischer Informationen – jener Informationen, die über den genetischen Informationen liegen – erfordert, beeinflusst die dynamische Natur der Epigenetik auch die Chromatinstruktur und reguliert verschiedene Aspekte der DNA-abhängigen Prozesse, einschließlich der Programme der Genomreplikation. Die Entwicklung eines Organismus löst zudem Veränderungen in der epigenetischen Landschaft aus, die wiederum das Replikationsprogramm beeinflussen. Während der Genomreplikation dekondensiert das Chromatin, die DNA wird entwunden, und neue DNA wird synthetisiert und erneut um die Histonkerne gewickelt. Die räumlich-zeitliche Regulation des Replikationsprogramms gewährleistet auch die getreue Vererbung der epigenetischen Eigenschaften, insbesondere der posttranslationalen Modifikationen der Histone. Dennoch bleibt das komplexe Zusammenspiel zwischen Epigenetik und dem Genomreplikationsprogramm ein Rätsel, das weitere Erforschung erfordert. Die vorliegende Arbeit beleuchtet zunächst die entwicklungsbedingten Veränderungen in den Genomreplikationsprogrammen, die Eigenschaften von Replikationsgabeln und die zugrunde liegende Chromatindynamik in menschlichen Zellen. Mithilfe von Repli-FISH wurde zudem eine Veränderung der Replikationszeit weniger erforschter tandemartiger und verstreuter Wiederholungssequenzen, die einen großen Teil des Genoms ausmachen, untersucht. Im Vorfeld wurde ein neuartiger Ansatz zur Analyse von Repli-FISH entwickelt. Dabei wurden ein Wechsel im Replikationszeitpunkt ribosomaler DNA sowie unterschiedliche Replikationsprogramme von Alu-, LINE1- und Zentromer-Wiederholungen beobachtet. In pluripotenten Stammzellen konnten Interaktionen zwischen der Genomreplikationsmaschinerie und der RNA-Polymerase I, die für die Transkription ribosomaler DNA verantwortlich ist, nachgewiesen werden. Insgesamt ergänzt und 7 erweitert die Studie unser Verständnis der entwicklungsregulierten Genomreplikationsprogramme in menschlichen Zellen. Hochdurchsatz-Bildanalysen wurden eingesetzt, um Histonmodifikationen in verschiedenen Zellzyklusphasen zu quantifizieren und die Beziehung zwischen Genomreplikation und Epigenetik zu charakterisieren. Zusätzlich wurde das statistische Tool nucim verwendet, um die dynamische Lokalisierung von Histonmodifikationen in verschiedenen Kompaktionsklassen während des Zellzyklus und des Sub-S-Phasen-Fortschritts in pluripotenten Stammzellen zu kartieren. Dabei wurde eine dynamische Lokalisierung im konstitutiven Heterochromatin sowie dessen Replikationszeitregulation durch H3K36me3 aufgedeckt. Darüber hinaus wurden genomweite Dynamiken von H3K27me3 und H3K4me3, die auch das poised/bivalente Chromatin in embryonalen Stammzellen markieren, in einem auffälligen Muster dargestellt. Mithilfe der synthetischen Biologie wurde das Replikationsprogramm durch die Lokalisierung des konstitutiven Heterochromatins in die Nähe der Kernlamina gestört, und die Auswirkungen auf die Epigenetik wurden untersucht. Ferner wurden die frühesten Replikationsursprünge sowie deren (epi-)genetische und chromatinbezogene Merkmale untersucht, um das Domino-Modell der Replikationsprogression zu validieren. Ein Ansatz, um lebende und fixierte Zellen mit den frühesten Ursprüngen zu identifizieren, wurde etabliert, um eine detaillierte Untersuchung zu ermöglichen. Es wurde festgestellt, dass ribosomale DNA-Tandemwiederholungen bevorzugte Startpunkte für die Replikation in Säugetierzelllinien und -arten darstellen. Um die genetische Natur dieser Ursprünge zu untersuchen, wurden Methoden entwickelt, um Einzelzellgenome linear zu identifizieren, zu sammeln und zu amplifizieren, um mikro-kopienzahlenabhängige Gewinne zu messen. Abschließend wurde ein Modell vorgestellt, das die Progression der Genomreplikation und die Interaktion zwischen epigenetischen und Replikationsprogrammen beschreibt. 8 List of Figures: Figure 1: A generalized blueprint of a human chromosome. Figure 2: The rDNA and (peri)centromere placement in mouse and human chromosomes. Figure 3: Different states of the chromatin. Figure 4: Genome replication at different resolutions. Figure 5: The Domino model of genome replication progression. Figure 6: Metaphase and interphase fluorescence in situ hybridization (FISH) of the repetitive genomic elements. Figure 7: Image analysis pipeline for RFi detection, characterization, and measurements. Figure 8: Image analysis pipeline for mapping RFis to chromatin compaction classes using Nucim package on statistical analysis platform R. Figure 9. Cell cycle and replication dynamics analysis of pluripotent and somatic cells. Figure 10. Feature analysis of the replication foci (RFi) in different S phases. Figure 11. Quantification of the number of replicons and fork speed in S phase stages. Figure 12. Genome-wide replication origins distribution in selected human cell lines based on the SNS-seq origin mapping method. Figure 13. Quantification of chromatin compaction with replication progression across cell lines. Figure 14. Replication timing of genomic repeat elements. Figure 15. Developmental difference in replication timing of rDNA repeats. Figure 16. The developmental difference in replication timing of Y chromosome. Figure 17. A summary of the developmental difference in genome replication features in pluripotent stem cells (PSC) and somatic cells. Figure 18. Cell cycle-dependent dynamics of the histone modification levels. Figure 19. Mapping the chromatin compaction association in cell cycle stages reveals dynamic subnuclear localization of individual histone marks. Figure 20. H3K36me3 dynamically localizes to the pericentromeric heterochromatin prior to its replication. Figure 21. Histone modifications mapped to chromatin compaction classes in human cells. Figure 22. Knockdown approach for H3K36me3. Figure 23. Loss of H3K36me3 influences DNA-dependent processes. Figure 24. The MajSat forward RNA influences the pericentromeric replication program. Figure 25. The model depicts the role of H3K36me3 in MajSat RNA-led maintenance of the replication program. 9 Figure 26. Capturing the earliest RFi in single cells. Figure 27. Earliest RFi are randomly fired throughout the nucleus to create a Domino-like origin firing. Figure 28. Individual replicons induce a Domino-like replicon cluster. Figure 29. Mapping individual replicons to chromatin compaction classes reveals higher enrichment in open compartments. Figure 32. Repli-FISH reveals the repeat elements associated with the earliest origins. Figure 30. Earliest origins are fired from gene-rich open chromatin. Figure 31. The image shows the location of the gene-rich elements enriched in the open chromatin. Figure 33. Feasibility of sequencing the earliest replicating regions. List of Tables: Table 1: Overview of the human genome Table 2: Comparative overview of replication origins across domains of life Table 3: List of all the model cell lines used and their properties Table 4: The list of nucleotide and nucleoside analogs Table 5: List of the antibodies used for immunostaining Table 6: A list of the probes and preparation methods Table 7: Image acquisition microscopes Table 8: Image analysis, data analysis, and visualization Table 9: Publicly available datasets for the replication origin mapping 10 Contents Preface...................................................................................................................................................3 Summary............................................................................................................................................... 4 Zusammenfassung...............................................................................................................................6 1. Introduction.....................................................................................................................................12 1.1 DNA: The blueprint of life......................................................................................................... 12 1.2 Chromatin and the role of epigenetics in chromatin states.......................................................18 1.3 Genome replication program and its regulation........................................................................22 The questions................................................................................................................................. 32 2. Methods...........................................................................................................................................33 2.1 Cell culture and transfection..................................................................................................... 33 2.2 Doubling time and (sub)S phase duration:............................................................................... 35 2.3 Genome replication labeling, visualization, and immunostaining............................................. 36 2.4 Probe generation, metaphase spread, repli-FISH, and immuno repli-FISH............................. 40 2.5 Knockdown experiments, immunostaining, and RNA FISH..................................................... 45 2.6 Microscopy............................................................................................................................... 45 2.6 Image Analysis......................................................................................................................... 47 2.7 Genome-wide origin mapping...................................................................................................51 2.8 Single-cell collection, genome amplification, and analysis....................................................... 53 2.9 Data analysis, statistical analysis, and visualization.................................................................54 3. Results.............................................................................................................................................55 3.1 Developmental changes in genome replication progression in human cells*...........................55 3.1.1 Introduction...................................................................................................................... 55 3.1.2 Developmentally Conserved Spatio-Temporal Replication Pattern in Humans...............56 3.1.3 Characterization of Spatio-Temporal RFi Reveals a Change in Late-Replicating RFi Distribution................................................................................................................................59 3.1.4 Replicon Quantification, Fork Efficiency, and Genome-Wide Origin Mapping Unravel Alterations in the Genome Replication Program across Developmental Transitions............... 63 3.1.5 Chromatin Compaction Analysis, and RFi- Associated Histone Modification Measurements Reveals Differential Chromatin Dynamics....................................................... 68 3.1.6 Repli-FISH Reveals Developmental Changes in the Replication Timing of Tandem and Interspersed Repeats............................................................................................................... 69 3.1.7 rDNA Tandem Repeats Show a Switch in Replication Timing and Change in Replication, Transcription Interaction........................................................................................................... 74 3.1.8 Sex Chromosome Y Replicates Throughout the S phase and Shows a Developmental Switch in Replication Timing.....................................................................................................75 3.1.9 Conclusions/Discussion...................................................................................................77 3.2 Correlation of genome replication progression and epigenetics in pluripotent stem cells........ 79 3.2.1 Introduction...................................................................................................................... 79 3.2.2 Histone modification levels are cell cycle dependent...................................................... 80 11 3.2.3 Mapping histone modification to chromatin compaction classes reveals distinct cell cycle-dependent dynamics of individual histone marks............................................................84 3.2.4 H3K36me3 dependent transcription fidelity of pericentromeric forward RNA regulates chromatin structure and replication features.............................................................................91 3.2.5 Conclusions/Discussion...................................................................................................98 3.3 Mammalian earliest genome replication origins stochastically activate from (in)active nuclear compartments to create a domino-like replication progression.................................................... 100 3.3.1 Introduction.................................................................................................................... 100 3.3.2 Capturing the cells with earliest origins......................................................................... 102 3.3.3 Mammalian earliest replication origins fire randomly throughout the nucleus and create a Domino like replication progression........................................................................................104 3.3.4 Earliest replicons are fired in (in)active nuclear compartments and cluster to form replication foci/timing domain................................................................................................. 108 3.3.5 Earliest RFi fire from conserved gene-rich open chromatin and repetitive elements.....111 3.3.6 Conclusion..................................................................................................................... 117 3.3.7 Perspectives: sequencing earliest RFi/replication domain and its chromatin structure in a single cell................................................................................................................................ 119 4. Annex.............................................................................................................................................142 4.1 Honorary Declaration - Ehrenwörtliche Erklärung.................................................................. 142 4.2 CV...........................................................................................................................................143 12 1. Introduction Chromatin, composed of DNA and histone, carries hereditary information in eukaryotes and is spatially organized within the nucleus. The role of chromatin in cellular development and differentiation is pivotal, making the study of chromatin duplication and its intricate regulation through epigenetic mechanisms a fundamental area of research. The second half of the last century has answered many fundamental questions about chromatin duplication. The discoveries from the Hershey-Chase experiments showed that DNA is the genetic material; the X-ray diffraction by Rosalind Franklin and Maurice Wilkins led to the elucidation of the DNA double-helical structure by Watson and Crick, followed by Meselson and Stahl’s experiment that established the semi-conservative nature of DNA replication 1–3. This encouraged further discoveries such as the isolation and purification of the first DNA polymerase, followed by pioneering work on DNA replication organization using in vivo pulse labeling with radioactive thymidine, and DNA fiber analysis led to the establishment of the concept that DNA replication proceeds bidirectionally from an opening site or active origin of replication (“ori”) 4,5. Further developments with model systems like fission yeast and Xenopus egg extract revealed key cell cycle regulation of genome replication 6,7. Development of conjugated and analogs of nucleotide such as BrdU, alongside the advent of confocal microscopy, unraveled some of the key features of the spatio-temporal genome replication progression 8,9. The human genome project and the development of DNA sequencing technologies further complemented our understanding of the genome replication program 10. However, the chromatin duplication program and its regulation are yet to be fully understood, especially in mammals. 1.1 DNA: The blueprint of life DNA (deoxyribonucleic acid) serves as the hereditary material in all known living organisms and many viruses. Its structure, a double helix, comprises two strands that store genetic information, which is passed during cell division or across generations. Dissecting the structural and functional units of the genome is crucial in understanding the life forms and their development. The human genome project, where the genome was sequenced using bacterial artificial chromosomes (BACs) that were further ordered and oriented along the human genome utilizing radiation hybrid, genetic linkage, and fingerprinting, led to a quantum leap in our understanding of the human genome and encouraged other species to be sequenced 11–13. With the emergence of long-read sequencing methods and stronger aligning algorithms, these genome assemblies have been constantly improved and now represent the telomere-to-telomere (T2T) sequence 14. However, the study was conducted in a complete hydatidiform mole (CHM) haploid cell, which retains only the paternal genetic material. This model lacks the full complexity and diversity present in normal diploid cells, particularly in the context of tandem repeats such as ribosomal DNA and centromeres. Furthermore, a fully assembled 13 mouse genome sequence remains elusive. Despite these limitations, fundamental structural and functional insights into the genome have been achieved and are well-documented. Table 1: Overview of the human genome Genomic element Percentage of the genome (%) protein-coding regions 1.5 - 2.0 segmental duplications 5.0 Alu 10.9 LINE1 (L1) 20.9 satellite DNA 3 telomere 0.3 rDNA 0.32 non-coding 2 other promoters 0.5 other repeats 10 A major surprise of the human genome project was that only a small fraction of the human genome is protein-coding, whereas a large fraction is non-coding sequences and repetitive elements 13. In the T2T assembly, the protein-coding regions increased slightly (19,890 to 19,969), showing the accuracy of the previous assembly when it came to characterizing protein-coding regions. However, the repetitive fraction was further increased from 50% to 54% of the total genome 14. Altogether, the long and short interspersed nuclear elements (LINE and SINE) comprise more than 60%, satellite DNA makes up 10%, and the rDNA tandem repeats from five acrocentric chromosomes are around 1 % of the total repeats. Table 1 describes the constituents of the human genome. Protein-coding regions: These sequences of the genome are transcribed into RNA and translated into proteins. Even though these constitute less than 2% of the total genome, they provide the blueprint for synthesizing proteins, which perform a vast array of functions within the cell 14. Each contains a specific sequence of nucleotides that encodes instructions for building a protein or, in some cases, functional RNA molecules. These regions, also known as exons, are often intervened by intronic sequences or introns (except some histone genes) and contain an open reading frame with a start and stop codon defining the regions to be transcribed/translated. Each gene is also flanked by the upstream enhancer and repressor sequences, which are subject to being (in)active and, hence, 14 controlling the differential gene expression. Based on developmental stages, or cellular identity, a combinatorial subset of genes is activated and repressed by epigenetic regulation, hence regulating the expression of the proteins required for differentiating or maintaining the cellular identity. Non-coding regions: A significant proportion of the genome is made up of non-coding sequences, which do not code for proteins yet play crucial roles in regulating gene expression or maintaining genome integrity. These can be placed between genes and often contain regulatory elements for various DNA-dependent processes, which is crucial for 3D genome organization. A decent fraction of the non-coding regions also contain sequences for non-coding RNA (ncRNAs), the RNA molecules that are not translated into proteins yet have regulatory functions such as microRNAs (miRNA) and long non-coding RNA (lncRNA). The miRNAs play regulatory roles post-transcriptionally by targeting alternative splicing, mRNA stability, and protein translation 15. The lncRNAs play key roles in gene regulation and chromatin organization by (in)activating target genomic regions or a whole chromosome 16. Tandem repeats: These repeats are organized as multiple copies of a homologous DNA sequence, which are arranged in a head to tail pattern to form tandem arrays and can be of varied sizes and repeat units. Initially relegated as “junk DNA,” these sequences are now recognized for their crucial role in regulating some key structural and functional aspects of basic cell operations. For example, the telomeric repeats serve as the cap of individual chromosomes, maintaining genome stability and preventing chromosome degradation 17. This underscores the importance of these once-dismissed genomic elements in our understanding of genome function and stability. 15 Figure 1: A generalized blueprint of a human chromosome. (A) Illustration of a human chromosome representing various repeat elements in both p and q arms. The magnified region shows the placements of the (peri)centromeric repeats. In most chromosomes, including acrocentric ones, the p arm is followed by the centromeric ɑSat, where the kinetochore is formed and spindle fibers are assembled. This is followed by some non-satellites, mostly transposable elements, before pericentromeric satellite repeats. (B) Illustration of a generalized (peri)centromeric region showing varied placements of varied HSats with respect to ɑSat higher order repeats (HOR) in different chromosomes, size of the repeat, and their genetic features. (C) Scheme shows the chromosome 9 containing the largest HSat repeat. In chromosome 13, the rDNA tandem repeat is sandwiched between HSat1A and HOR. The figure is modified from 18,19. 16 Figure 2: The rDNA and (peri)centromere placement in mouse and human chromosomes. Illustration shows the differential placement of rDNA with respect to centromeres in two species. In humans, the centromere is flanked by rDNA and q arm, whereas in mouse chromosomes, the rDNA is placed in between the centromere and q arm. Representative FISH images of rDNA and centromere in metaphase (scale bar 10 µM) (modified from 20). 17 In the mammalian genome, (peri)centromeric repeats form the largest tandem repeat and act as an axis for genome organization, stability, and chromosome segregation. In humans, centromeric regions contain the alpha satellite (ɑSat) repeats consisting of a 171 bp monomeric unit in large homologous arrays (85.2 mb genome-wide) (Figure 1A, & 1B) 21. In addition to this, human satellites (HSats, HSat2 and HSat3) comprise CATTC repeats and form one of the largest contiguous satellite arrays (27.6 Mb array of HSat3 in Chr 9) (Figure 1C). The AT-rich HSat1A and HSat1B (Y and acrocentric specific) are also found in multiple chromosomes. This is further flanked by the pericentromeric regions extending toward the p and q arms 14,18. The ɑSat monomer also has variations, and multiple subtypes often form higher-order repeats (HOR) existing next to each other, forming large, homogeneous repeats of the HORs.The kinetochore proteins are usually associated with a subset of these HOR arrays called the active array 22. In mouse chromosomes, the AT-rich centric and pericentromeric structure is formed by tandem repeats of minor and major satellite sequences, respectively 23. In mice and humans, the rDNA tandem repeats are present in the short-arm/acrocentric chromosomes (Chromosomes 12, 15, 16, 18, and 19 in mice and 13, 14, 15, 21, and 22 in humans) (Figure 1A, and Figure 2A). The rDNA repeat can be of varied structures, and the number of repeat units varies from chromosome to chromosome. In humans, Chr 13 has the highest fraction of rDNA, whereas Chr14 has the lowest units of the rDNA array. In Chr13, the rDNA is positioned between two HSat repeat arrays, the large array of HSat1A and ɑSat 18,20 (Figure 1C). The rDNA repeat is flanked by the proximal and distal junction, together forming the nucleolar organizer regions (NOR) 24. The interspersed repeat elements: Unlike tandem repeats, which occur consecutively one after another, interspersed repeats are distributed throughout the genome (Figure 1A). Most of these interspersed repeats are transposable elements, sequences capable of relocating within the DNA. One of the most notable of these is the Alu retrotransposon family, the most abundant short interspersed nuclear element (SINE) in primates, including humans. Alu elements are typically around 300 base pairs long. In the human genome, Alu elements make up approximately 11% of the total DNA. These elements play crucial roles in the regulation of gene expression and the evolutionary dynamics of the genome in all primates 25. The SINEs transcribe non-protein coding RNA but are reverse transcribed and incorporated back to another location, hence believed to be dependent and co-evolved with long interspersed nuclear element (LINE) (20% of the human genome), which produce open reading frame (ORFs) proteins and enable them to be reverse-transcribed and incorporated back into the genome 26. The increased activities of LINE and SINE are also correlated during early embryo development 27. 18 The LINE1 (also L1) is the most abundant LINE element and encodes for ORF 1, which helps non-autonomous Alu and other SINE elements in humans 28. 1.2 Chromatin and the role of epigenetics in chromatin states Chromatin is the highly organized structure of DNA and protein packaged inside the nucleus of eukaryotic cells. The compacted structure of DNA wrapped around histones is inherently repressive for all DNA-dependent processes. The fundamental unit of chromatin, the nucleosome, consists of a segment of DNA wrapped around a core of histone proteins. The core of the histones is formed by a tetramer of H3:H4 dimer sandwiched between two H2A:H2B dimers. This fundamental unit of DNA and histone octamer form the nucleosomes which are further folded and coiled into hierarchical higher-order structures that ultimately form the chromosomes 29. Modifications in DNA and histone dynamically modulate the nucleosome compaction, regulating the nature of chromatin and, hence, the access to different machinery involved in DNA processes. This information on top of the DNA, called epigenetics, also regulates differential gene expression, playing a critical role in development. Such epigenetic marks need to be maintained over cell division cycles, and, on the other hand, such marks need to be reprogrammed during cellular (retro)differentiation. Hence, despite all cells in multicellular organisms having an identical genome, persistent yet plastic epigenetic information regulates when and where cells commit to different lineages with distinct phenotypes 30,31. Aberration in the epigenetic information leads to disrupted chromatin regulation, causing genome instability and leading to various diseases, including cancer 32,33. Different states of the chromatin: The plastic nature of epigenetics allows the chromatin to be relatively decondensed (euchromatin) or condensed (heterochromatin). Euchromatin is generally associated with actively transcribed genes and is more accessible to transcription factors and other regulatory proteins. In contrast, heterochromatin is typically transcriptionally repressed and serves to protect and stabilize the genome by maintaining structural integrity 34. Furthermore, the heterochromatin that is repressed inherently is called constitutive heterochromatin, and that is repressed during development in order to achieve differential gene expression or dosage compensation is called facultative heterochromatin (Figure 3A). Each chromatin state is distinct by its epigenetic marks, and even facultative and constitutive heterochromatin are marked by distinct epigenetic marks 35. The classic example of constitutive heterochromatin, the ɑSat in human and (peri)centromeric repeats in mouse cells which are repressed at the very early stages of development and remain repressed throughout the lifetime in all types of cells 36. A combinatorial subset of the genome gets repressed in order to express/repress a combination of genes during developmental stages or during terminal differentiation in order to achieve the tissue-specific identity. The inactivation 19 of one of the X chromosomes in female cells is one of the classic examples of facultative heterochromatin, where the whole chromosome is repressed 37. Figure 3: Different states of the chromatin (A) The mouse myoblast nuclei stained with DAPI represent the spectrum of condensation levels of the chromatin in 3D. The left zoomed box represents loose euchromatin, the middle represents facultative heterochromatin (inactivated X chromosome), and the right represents constitutive heterochromatin (pericentromeric repeats). Epigenetics, and its role in regulating chromatin states: Epigenetics is the consistent but reversible changes that occur to the chromatin on top of the genetic information. Three pillars of epigenetics are DNA modification, histone variants/post-translational modification, and non-coding RNA. While in lower eukaryotes such as yeast, DNA modification is absent, in higher eukaryotes, this plays a prominent role in chromatin regulation. The most direct mechanism of epigenetic regulation is the modifications or variants of histone proteins. The modifications, which include methylation, acetylation, phosphorylation, and ubiquitination, occur on the N-terminal tails of the histones that protrude from the nucleosome. The pattern of these modifications constitutes the "histone code," which is read by specific effector proteins that alter chromatin structure and regulate gene expression 38. Histone acetylation involves the addition of an acetyl group to lysine residues, mediated by histone acetyltransferases (HATs). Acetylation neutralizes the positive charge of lysine, reducing the affinity between histones and DNA. 20 This results in a more relaxed chromatin structure that is accessible to transcriptional machinery, thereby promoting gene expression 39. Histone deacetylases (HDACs) remove these acetyl groups, leading to chromatin compaction and transcriptional repression 40. The property of the methylation of lysine is site-specific. Methylation of histone H3 on lysine 9 (H3K9me) is a hallmark of heterochromatin and gene repression, facilitating the binding of proteins that compact chromatin and silence gene expression 41. However, tri methylation of H3 K4 (H3K4me3) is usually associated with transcriptionally active regions. Histone methyltransferases (HMTs) and demethylases (HDMs) regulate these marks. The list of canonical histones also includes the linker histone H1 (including its variants). Instead of associating with the octamer core, it sits on top of the nucleosome structure, binding both the entry and exits of the DNA fiber, keeping in place the DNA that was wrapped around the histone octamer. By interacting with the linker DNA between nucleosomes, H1 promotes the folding and packing of nucleosomal arrays into a more condensed form, which is essential for fitting the vast length of DNA into the confined space of the nucleus. This makes H1 an integral factor of the 30 nm fiber 42. The histone variants of multiple canonical histones play crucial roles in regulating various specialized chromatin functions in addition to gene regulation. Variants of the H2A are H2A.Z, H2A.X, and macroH2A. The H2A.Z is involved in regulating gene expression, is often found at the promoters of active genes, and is associated with both activation and repression of transcription, depending on the context. H2A.Z incorporation into nucleosomes alters the stability and structure of chromatin, facilitating the binding of transcription factors and chromatin remodelers 43. H2A.X is critical for the DNA damage response and is distributed throughout the chromatin. H2A.X is phosphorylated at serine 139 (γH2A.X) upon DNA double-strand breaks and serves as a signal for the recruitment of DNA repair machinery, playing a crucial role in maintaining genome integrity 44. MacroH2A is known for its role in X-chromosome inactivation and the formation of facultative heterochromatin. MacroH2A-containing nucleosomes are associated with transcriptional repression and chromatin compaction 45. Unlike the canonical H3, which is incorporated into chromatin during DNA replication, H3.3 is deposited into chromatin independently of DNA synthesis. It is often found at active gene loci and regulatory regions, such as enhancers, and is associated with transcriptional activity and the maintenance of open chromatin states 46. A centromere-specific H3 variant, CENP-A replaces H3 in nucleosomes at centromeres. This variant is essential for the assembly and function of the kinetochore, the structure responsible for chromosome segregation during cell division 47. DNA methylation is the methylation of cytosine residues in DNA, typically occurring at CpG dinucleotides. It is the most studied and prominent DNA modification associated with the repressed 21 chromatin and plays key roles in gene expression during development, X-chromosome inactivation, and imprinting genome stability 48. Regions of the genome with high levels of 5mC are often found in the gene promoters and repetitive sequences, including transposable elements, leading to a compact chromatin structure that is less accessible to transcriptional machinery. This results in the stable silencing of gene expression and maintains genome stability. In mammalian genomes, approximately 70-80% of CpG dinucleotides are methylated, and the loss of global DNA methylation is often associated with cancerous cells, highlighting the pervasive nature of this modification in maintaining global genomic stability 49. The establishment and maintenance of DNA methylation patterns are governed by DNA methyltransferases (DNMTs). The DNMT1 is primarily responsible for maintaining methylation during DNA replication by recognizing hemimethylated DNA and restoring the methylation mark on the newly synthesized strand 50. DNMT3A and DNMT3B, on the other hand, are de novo methyltransferases that establish new methylation patterns during development 51. Dysregulation of these enzymes can lead to aberrant methylation patterns, contributing to developmental disorders or diseases such as cancer, where hypermethylation of tumor suppressor genes and hypomethylation of oncogenes disrupt normal gene function 49,51. In addition to static methylation marks, the DNA methylation landscape is dynamically regulated through the cycle of methylation and demethylation, allowing for the fine-tuning of gene expression in response to developmental and environmental changes. Hydroxymethylation is the first step in the active DNA demethylation pathway, which involves the conversion of 5mC to 5-hydroxymethylcytosine (5hmC) by the Ten-Eleven Translocation (TET) family of enzymes (TET1, TET2, and TET3). 5hmC serves as an intermediate in the demethylation process and is also increasingly recognized as an epigenetic mark associated with gene activation and enhancer regions 52. The presence of 5hmC is particularly enriched in the brain and stem cells, suggesting a role in neurodevelopment and cellular differentiation 53. Following the formation of 5hmC, TET enzymes can further oxidize this intermediate to form 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). These oxidized derivatives are substrates for the base excision repair (BER) pathway, which recognizes and removes these modified bases, leading to the insertion of an unmodified cytosine and completing the demethylation process 54. This mechanism allows for the active removal of methylation marks and the dynamic regulation of DNA methylation in response to cellular signals. The implications of DNA modifications are widespread. For example, (de)methylation of cytosine directly affects the stability of the DNA duplex and DNA metabolism 55. DNA modifications interact with histone modifications to shape the chromatin landscape. For instance, methylated DNA often recruits proteins with methyl-CpG binding domains (MBDs) that further recruit histone deacetylases (HDACs) and other chromatin remodelers to establish a repressive chromatin state 56. 22 Hydroxymethylation, on the other hand, is associated with open chromatin and active transcriptional regions 57,58. Non-coding RNA is the third pillar of epigenetics. Despite not encoding any proteins, ncRNAs are integral to regulating chromatin dynamics, transcriptional activity, and the maintenance of epigenetic memory. Long non-coding RNAs (lncRNA) are the most prominent transcript class (usually longer than 200 nucleotides) and play roles in chromatin modulation and epigenetic memory. They can interact with chromatin-modifying complexes and guide them to specific genomic loci, influencing histone modifications and chromatin states. For instance, the lncRNA HOTAIR interacts with Polycomb Repressive Complex 2 (PRC2), targeting it to HOX gene clusters to mediate histone H3 lysine 27 methylation (H3K27me3) and transcriptional repression 59. They can also facilitate the recruitment of chromatin modifiers that establish heritable epigenetic marks, such as DNA methylation and histone modifications, ensuring stable gene silencing or activation over time 60. Moreover, the orientation of the ncRNA defines its role in regulating the targeted chromatin states, and XCI is a prime example of this. XIST (X-inactive specific transcript) and TSIX are two lncRNAs involved in this process 16,61. XIST is a sense lncRNA expressed exclusively from the X chromosome that will be inactivated. It spreads across the chromosome in cis, attracting silencing complexes like DNMTs and PRC2, which deposit H3K27me3 marks to initiate chromatin compaction transcriptional silencing, transforming the whole chromosome into a compacted structure 62,63. XIST’s role is central to initiating and maintaining XCI and ensuring dosage compensation. TSIX is an antisense transcript of XIST that is transcribed from the same region but in the opposite direction. TSIX counteracts XIST by preventing its accumulation and function. It regulates XIST expression through chromatin remodeling and transcriptional interference. By binding to the Xist locus, TSIX helps maintain the active state of the chromosome from which it is expressed 64. This interplay ensures that only one X chromosome in each cell is inactivated, preserving the balance of gene dosage. All three components of epigenetics (Histone modifications/variants, DNA modification, and ncRNA) contribute uniquely to the control of gene expression and the maintenance of cellular identity, but their interactions create a dynamic and interconnected regulatory network that fine-tunes the epigenome. Mapping and understanding these interactions is crucial for comprehending how cells respond to developmental cues and environmental changes or even basic cellular functions. 1.3 Genome replication program and its regulation Despite divergent blueprints, the fundamental process of genome replication is functionally conserved across domains of life. Bacteria and archaea possess circular DNA, while eukaryotes have linear DNA packed within a nucleus. Unlike the almost naked DNA in bacteria and archaea, the eukaryotic genome is packed with histone proteins. Yet, all the life forms follow the conserved 23 Table 2: Comparative overview of replication origins across domains of life Features Prokaryotes Eukaryotes Bacteria Archaea Yeast Metazoa Chromosome structure Circular Circular Linear Linear Number of Origins Typically one One/multiple Multiple Multiple Origin directionality Bidirectional from one site Bidirectional from one site Bidirectional from many sites, Sometimes unidirectional Bidirectional from many sites, sometimes unidirectional Initiation Proteins DnaA Orc1/Cdc6 Origin Recognition Complex (ORC) ORC Regulatory Mechanisms DnaA activity, methylation Protein-DNA interactions Cell cycle kinases Cell cycle kinases, regulatory proteins, Epigenetics Origin Sequence Sequence-specific, AT-rich Sequence-specific Sequence-specific Sequence independent principle of genome duplication, where, in all cases, the replication starts from an opening site called the origin of replication (Ori) in order to give access to DNA polymerase and other replication machinery 65. Both bacteria and archaea have circular DNA and have one of such Ori (with a few exceptions in archaea having multiple origins). These Oris are sequence-specific, such as the E. coli genome replicates from a single origin known as OriC. These origins often consist of specific DNA sequences recognized by initiator proteins, such as DnaA in bacteria, which initiates the unwinding of DNA at the origin 65. Most of the archaeal genomes carry one copy of oriC, yet several genera carry multiple oriC copies, which may respond to distinct initiator complexes 66. Eukaryotic cells possess multiple replication origins along their linear chromosomes to facilitate the replication of larger genomes. With DNA autoradiography, Huberman and Riggs (1968), for the first time, studied the replication origins and bidirectional fork movement in mammalian chromosomes 5 (Figure 4A). This showed each chromosome has many active origins, allowing simultaneous initiation at several points. In simpler eukaryotes like yeast, origins are defined by specific sequences (e.g., ARS in Saccharomyces cerevisiae) 67. In comparison, in metazoans, these vaguely defined origins are more flexible and often influenced by multiple factors, including DNA structural arrangements, regulatory 24 proteins, the chromatin environment, and epigenetic marks, rather than one single factor, making the process more stochastic and complex 68. A summary can be found in Table 2. As the number of replication initiation sites within the cell increases, managing their precise activation becomes critically important. The cell must ensure that DNA replication is accurate and timely and that each genomic region is duplicated only once per cell cycle. The uncoordinated firing of replication origins can lead to re-replication, causing genomic instability and potentially acquiring harmful mutations. Furthermore, DNA replication occurs within the dynamic environment of the chromatin and must be in sync with other chromatin-templated processes, particularly transcription. These two processes share the same DNA template and can influence each other in significant ways. Transcription can modify chromatin structure, affecting the accessibility of replication origins, while the process of DNA replication can impact the transcriptional activity of genes 69. At a global level, the eukaryotic genome replicates in a highly organized and non-random fashion, with specific genomic regions replicating earlier or later in the S-phase. This concept initially described over 60 years ago, highlights two critical aspects of eukaryotic DNA replication. First, not all replication origins are activated simultaneously, and second, origins that do fire together are not evenly distributed across the genome 70. These characteristics create distinct replication patterns that evolve in a spatiotemporally conserved pattern throughout the S-phase and can be observed under fluorescence microscopy 71 (Figure 4B). On a finer scale, at the level of individual replicons, the initiation of replication appears to be a stochastic process. This means that not every potential origin is activated in every cell cycle. Instead, each origin has a variable firing efficiency, with some origins being more likely to initiate replication than others 72–74. The regulation and features of DNA replication are dynamic processes, adapting throughout development, differentiation, and stress 75–77. Epigenetic modifications, such as DNA methylation, histone modifications, and the presence of non-coding RNAs, play crucial roles in shaping the chromatin landscape. These modifications are already known to influence transcription and are believed to affect any chromatin-based event, including DNA replication. Over the past decades, studies have shown correlations between specific epigenetic marks and the timing of replication in various organisms 78–80. Furthermore, dynamic changes of the epigenetic profiles during development rewires the replication program 76,81. This adaptability and other observations suggest that replication control cannot be fully understood through genetic sequences alone. Possible candidates influencing these are epigenetics, chromatin state, DNA secondary structures, etc 68. Despite significant advances in the tools and techniques available to study epigenetic regulation, the precise influence of epigenetic modifications on the firing of replication origins, particularly in mammals, remains only partially understood. Most of the research is driven by the need to unravel how specific epigenetic 25 modification is inherited or affects replication. Yet the challenge lies in dissecting how epigenome as a whole affects the replication process, as individual modifications do not act alone but rather are deeply interwoven in complex crosstalk with each other. Naturally, this interconnectedness makes it difficult to isolate the effects of individual modifications on chromatin dynamics. This complexity is more pronounced in higher metazoans, which possess a more sophisticated and diverse array of epigenetic modifications compared to unicellular organisms or simpler metazoans. Additionally, the epigenetic landscape provides a flexible regulatory framework that can vary significantly within the same cell population. This variability adds another layer of complexity to experimental studies, making it challenging to discern consistent patterns or draw definitive conclusions about the role of epigenetics in regulating replication origin activity. To effectively tackle these challenges, employing a combination of methodologies is crucial. High-throughput approaches can provide broad insights into the general principles governing epigenetic influence on replication. However, these methods must be complemented with in vivo studies at the single-cell level to capture the nuanced and dynamic nature of epigenetic regulation in a living organism. Only through such integrative and multi-scale approaches can the role of epigenetics be fully elucidated in the regulation of DNA replication origin firing in complex mammalian systems. Genome replication program in mammalian cells: The eukaryotic genome is duplicated in a tightly regulated replication program. The number of origins, the firing time of these origins, and their fork speed are the prominent aspects determining the replication program. Earlier studies on genome replication in eukaryote models (like S. pombe or S. cerevisiae) reveal well-defined replication origins are fired stochastically yet follow a generally conserved replication timing 72,73. Higher eukaryotes, despite having vaguely defined origins, maintain a conserved replication timing, suggesting the ordered origin firing, rather than the origin, plays a significant role in faithful chromatin duplication 82,83. Replication timing was initially visualized in mammalian cells as distinct spatial patterns of RFi, emerging progressively as the cell advanced through the S phase. Dissection of these RFi in super-resolution microscopy reveals, these RFi correspond to a cluster of replicons (Figure 4D). Further insights from fluorescence in situ hybridization (FISH) or hybridization of target sequence on blots having BrdU immunoprecipitated DNA demonstrated that specific chromosome regions are replicated at defined times within the S phase rather than randomly 8,84. With the human/mouse genome project and advances in microarray technology, a broad view of the conserved replication timing was revealed 11,28,85–87. These studies unraveled that gene-rich chromosomes (like human chromosomes 22, 19, and 17) replicate earlier in the S phase, while others (for example, human chromosomes 18, 21, and Y) replicate later. Furthermore, while the GC and Alu-rich regions were replicated earlier, the LINE-rich regions were replicated later. With the advance of next-generation 26 sequencing, a comprehensive view of the replication timing was revealed by sequencing the nascent DNA 10. It showed the genome duplicates in megabase-sized replication domains, where the stretch of the DNA shares the same replication timing 76. In addition, these studies also characterized the constant and plastic replication domains, where the replication timing of the constant domains remains the same for different cell types, and the plastic domains change with developmental stage or cell type. However, these studies, as based on population, did not reveal the cell-to-cell variation or stability in replication timing until advances in single-cell genome amplification methods and copy number analysis paved the way to investigate these replication domains in individual cells 88. By collecting the cells in the middle of the S phase based on the DNA content and performing the copy number gain due to genome duplication, the replication timing was revealed 89,90. These studies revealed the genome-wide stability of these replication domains among cells, yet a certain degree of stochastic variation from cell to cell exists, especially in the late S phase (Figure 4C). Parallel to our understanding of replication timing, advancements in characterizing the replication origins in mammalian cells also happened using microarrays and next-generation sequencing. In Saccharomyces cerevisiae, origin selection is guided by the binding of origin recognition complex (ORC) to well-defined DNA sequences near ARS elements 91. However, metazoan replication origins do not conform to a conserved consensus sequence, and the ORC from higher eukaryotes exhibits no sequence specificity in vitro 92. Development in tools to map these origins, such as short nascent strands (SNS) isolation in mammalian cells, supported these findings 93. The sequencing of the Okazaki fragments (OK-seq) revealed the initiation zones located primarily within non-transcribed, broad (up to 150 kb) zones that often abut transcribed genes, directions of these origins, and the termination sites 94. Further investigations relate the origin positioning with respect to the transcription start site and the secondary structure of the DNA without revealing shared features among the origins 95,96. Isolating a sufficient amount of the Okazaki fragments or the SNS has been challenging, making it difficult to infer the origin feature in a single molecule scale, as seen in the combed fibers and cell-to-cell variation 97. Recent advancements in next-generation long-read sequencing (e.g., nanopore & pacbio) and detection of the incorporated nucleotide analog (BrdU) can reveal the single molecule information and more features about these origins, especially in the highly repetitive genomic regions 98,99. Nonetheless, the origin of mammalian replication is still an active and challenging research topic, and a consensus has yet to be reached about its features. 27 28 Figure 4: Genome replication at different resolutions. (A) The scheme shows a classic pulse-pulse experiment on the stretched DNA fibers: cell population treated with IdU followed by CldU incorporate these nucleoside analogs to the nascent DNA and are lysed, DNA fibers were stretched, and analogs are detected, revealing the replication origins and inter-origin distances. The lower scheme shows the origin progression in a semiconservative way after DNA melting in both directions. The leading strand is continuously replicated, whereas the lagging strand progresses discontinuously, producing shorter Okazaki fragments. (B) Schematic representation of the experimental workflow combining fixed-cell and live-cell microscopy to investigate the spatiotemporal progression of genome replication at different time resolutions. Fixed-cell microscopy employs short nucleoside analog pulses followed by long chases to resolve spatial patterns within sub-stages of the S phase. These patterns are visualized by detecting nucleoside analog incorporation and replication machinery components (e.g., PCNA). Live-cell microscopy tracks the dynamics of the replisome machinery (e.g., GFP-PCNA), with image registration applied to distinguish pre-existing and newly activated replication foci (RFi). This analysis reveals a sequential, Domino-like DNA replication progression, where stochastic or existing RFi initiate the firing of nearby RFi. (C) Replication timing profiling by repli-seq methods shows genome-wide stability at the population level and cell-to-cell stochastic variation in early and late replicating chromatin domains. The repli-seq data plotted were accessed from GEO: GSE108556 90. (D) Correlative microscopy of wide-field and superresolution followed by (nano)RFi segmentation reveals multiple replicons corresponding to one RFi or stable unit of the replication domain. Characterization of the replication timing and origin revealed that while the replication timing remains globally conserved, the replication origins are plastic. This suggested that the temporal order of firing the initiation sites, rather than the sites themselves, maintains the replication program, yet its mechanism and significance were unknown. Overall, the broader understanding of the replication program narrowed down to the understanding of the principles behind the progression of the chromatin replication. Furthermore, most of these studies did not reveal the information associated with the large repetitive elements (due to the inherent limitation of mapping repetitive elements using NGS), covering only a small fraction of the genome. The microscopic dissection of the spatiotemporal replication program in vivo has bridged many such gaps in our understanding. This approach already showed the conserved spatiotemporal propagation of replication foci/RFi in S phase stages 9,100,101. Live cell imaging of the GFP-tagged replication protein PCNA showed the dynamic nature of the replication foci that assemble and disassemble (rather than moving, merging, or dividing), creating differential spatial patterns in respective S phase stages 102. A high time-resolution microscopy, coupled with fluorescence recovery after photobleaching (FRAP) reveals the stable association of the replisome machinery with the RFi, and replication progresses in a Domino like next-in-line model 103. Quantitative analysis of human genome replication combining DNA combing, which reveals local origin firing and replication fork progression on single DNA molecules, with 29 Figure 5: The Domino model of genome replication progression. The illustration shows the stochastically fired or existing cluster of replicons arranged as replication focus or a megabase-sized replication domain induces firing of the nearby origin. Once fired, the origin also blocks the licensed origin present within the loop distance (~ 55 kb). massive sequencing of newly replicated DNA to generate population-averaged replication timing profiles showed that origins are activated synchronously in regions of shared replication timing, but gradually in temporal transition zones, and that the rate of origin firing increases as replication progresses 104. However, origin interference occurs when the distance between the two origins is low, usually less than 100 kb, and an average of ~ 40 kb 105. Based on the existing observations, the Domino model of replication progression was proposed, where stochastic activation of the first origin clusters leads to a chain reaction of sequential activation of later origin clusters depending on the relative spatial distribution in the genome within the nucleus. Yet the origin interference kicks in within a short distance, usually when the next origin is present in the same chromatin loop 106. The model could capture the spatiotemporal replication progression and replication timing as observed in microscopic and replication timing analysis (Figure 5). Targeting the late replicating major satellite repeats to the nuclear lamin in mouse myoblasts induced the early replication of these repeats, where the licensed origins in both lamin-associated facultative heterochromatin and targeted constitutive heterochromatin replicated concomitantly, validating the model 107. 1.4 Effect of chromatin and developmental state on replication program: It was already observed that different types of heterochromatin synthesize their DNA at varying periods from euchromatin 108. With the advances in multicolor fluorescence microscopy and digital image analysis, a strong correlation between the chromatin nature and replication timing was observed 23,109,110. While the correlation suggested the role of the chromatin in regulating the replication timing, the underlying mechanism was unclear. Furthermore, earlier studies using repli-seq concluded the correlation of 30 gene density, Alu, and GC content or LINE1or (peri)centromeric repeat-rich in determining early or late replication timing, respectively 85. Early-replicating regions are often associated with open chromatin states, enriched in histone modifications such as H3K9ac and H3K27ac, indicative of transcriptionally active euchromatin. In contrast, late-replicating regions are linked to heterochromatin, marked by modifications like H3K27me3 and H3K9me3, reflecting repressive chromatin environments 111. The development of chromatin conformation capture techniques, such as Hi-C, enabled the identification of topologically associating domains (TADs) 112. TADs are genomic regions where DNA sequences preferentially interact within the same domain, organizing chromatin into functional units that regulate gene expression and replication timing. These megabase-sized domains are broadly categorized into A and B compartments. The A compartment is enriched in active chromatin marks and gene-dense regions, while the B compartment contains repressive chromatin and is gene-poor, reflecting their roles in chromatin accessibility and functional organization 113. Correlation analysis of the replication timing with chromatin architecture has revealed a striking correspondence between early-replicating domains and the A compartment (active, euchromatic regions) and late-replicating domains and the B compartment (repressive, heterochromatic regions). Hi-C data shows that A and B compartments are formed by segregating TADs into mutually exclusive spatial regions, paralleling the spatial patterns of early and late replication foci observed through microscopy 114. Diffraction-unrestricted super-resolution nanoscopy could successfully capture various chromatin domains 115. Capturing the nascent chromatin and mapping these to chromatin domains reveals the temporal order of replication follows the hierarchical organization of chromosome territories, even though heterochromatin exerts local decompaction during replication 116,117. The interplay between genetic and epigenetic factors significantly influences the regulation of replication timing, with accumulating evidence pointing to the prominent role of (epi)genetic mechanisms. Multiple observations from manipulation of the histone acetylation level, including in mouse fibroblasts, where Trichostatin A (TSA) treatment advanced the timing of replication of the typically late replicating pericentromeric heterochromatin, suggested that the epigenetic (especially histone acetylation) plays a more critical role than the genetic in fine-tuning the replication timing 110,118. This was also consistent with earlier observations in eukaryotes 119,120. Nonetheless, the influence of epigenetics and replication timing seems to be both ways. Apart from epigenetics, several chromatin regulators also influence the replication program. One such factor is the replication timing regulatory factor 1 (Rif1), which binds to chromatin during early G1 and acts as a negative regulator of origin firing of late replicating chromatin by recruiting protein phosphatase 1 (PP1), which dephosphorylates key initiation factors like MCM complexes 121. Knock out of the Rif1, hence aberrates the replication timing 122. However, this aberration in the replication program also leads to 31 inefficient epigenetic inheritance, a process key to maintaining genome stability and cellular identity 123. Advancing the replication timing of pericentromeric heterochromatin by repositioning to the nuclear lamin (as a Domino effect) also disrupted the faithful epigenetic inheritance, and it mainly affected the heterochromatin constituents 107. The replication program is closely tied to developmental progression, adapting to shifts in chromatin architecture, replication origin dynamics, and fork progression. Early studies of DNA replication in Drosophila and Xenopus embryos reveal rapid cell cycles driven by unique characteristics 124,125. These early divisions lack gap phases (G1 and G2), consisting solely of alternating S-phases and mitosis. This abbreviated cycle relies on the activation of a high density of replication origins, reducing inter-origin distances and decreasing replicon sizes, which enable remarkably short S-phases essential for rapid embryonic development 126,127. In contrast, early mammalian development exhibits slower division rates. For example, while a Xenopus egg can produce up to 20,000 cells within hours, a mammalian zygote undergoes only a single division in the same timeframe 128. Despite these differences, early mammalian cells share some features with Xenopus and Drosophila, such as shortened gap phases and a nearly transcriptionally inactive state during the first zygotic cleavages. The absence or extreme reduction of gap phases in early embryonic cycles is a conserved feature across species, emphasizing the reliance on maternal stores to drive DNA replication and cell division before zygotic genome activation 129. This streamlined replication program is an adaptation tailored to meet the demands of early embryogenesis. Nonetheless, mouse embryos show spatial replication patterns already in the 2-cell, and a well-defined replication timing emerges in 4-cell stages, and the zygote genome activation and replication timing happen and evolve in parallel along with chromatin organization 36,128,130. In later stages of development, the replication timing is well established, and short gap phases emerge 131. Both mouse and human embryonic stem cells exert different replication programs when compared to differentiated cells 76,77,132. In mESC, in addition to the developmentally regulated genomic locus, the usually late replicating constitutive heterochromatin (major satellite repeats) shows a shift of replication timing earlier to the mid-S phase, before the nuclear lamin-associated chromatin replicates in late. Yet upon targeting the HDAC1 to the constitutive heterochromatin, the replication gets delayed 77. Furthermore, apart from the replication timing, the number of origins and fork speed play crucial roles in replication compilation in early embryogenesis. The Xenopus and Drosophila embryos show a higher origin activation and fork speed to achieve a higher rate of cell division, and the number of origins and fork rate go down upon transition to the later developmental stage 133. The 32 mammalian zygotes and embryonic stem cells possess comparatively higher replication origins than differentiated cells, reflecting the demands of rapid divisions yet exerting a slower fork speed 77,134,135. Chromatin nature, genome organization, and developmental transitions are critical in shaping the replication program and are intricately intertwined. Early developmental cells differ from differentiated or somatic cells in many aspects despite sharing the same genome. While epigenetics is the prominent player in shaping these trajectories, the chromatin duplication program must adapt plastically to this rewiring. It is key to understand these processes in detail, especially of the less studied repetitive elements, which are emerging as prominent players in genome organization and its regulation. The questions Our understanding of mammalian genome duplication and underlying mechanisms has advanced significantly, yet many remain unanswered. While recent approaches have revealed the replication program of various cell types, even in single cells during their developmental transitions, little is known about the replication program of the repetitive elements, which constitute more than 50 % of the human genome. The role of histone acetylation in regulating the replication program and the mechanisms of epigenetic inheritance is addressed comprehensively; it remains unknown if other epigenetic modifications like histone methylation or non-coding RNA influence the replication timing. Furthermore, while the Domino model explains how the replication progresses, it remains unclear which genomic regions are replicated the earliest in the S phase (or the first falls of the Domino). Hence, the questions this thesis aimed to answer are: - What are the spatiotemporal changes in the replication program in developmentally different human cells? - How do the cells use various epigenetic tools to regulate chromatin duplication? - Where are the earliest origins located, and what are their features? 33 2. Methods 2.1 Cell culture and transfection Table 3 describes the details of the cell lines used. All cells were grown in a humidified atmosphere of 5% CO2 at 37 °C. Cells were grown in Dulbecco’s modified Eagle medium (DMEM) (Cat.No.: D6429, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) supplemented with varied percentages of fetal calf serum (Cat.No.: FBS 11A, Capricorn Scientific GmbH, Hessen, Germany), 1x glutamine (Cat.No.: G7513, Sigma-Aldrich, St Louis, MO, USA), 1 µM gentamicin (Cat.No.: G1397, Sigma-Aldrich, St Louis, MO, USA), and 0.01 mg/ml hygromycin B (Cat.No.: 843555, Roche, Basel, Switzerland). HeLa Kyoto, hTERT RPE1, and BJ-5ta were grown in 10% FCS. HeLa Kyoto GFP mPCNA and HeLa Kyoto mCherry PCNA cells were grown in media supplemented with 600µg/ml G418 (Cat.No.:CP11.3, Carl Roth, Karlsruhe, Germany) and 2.5 µg/ml Blasticidin (Cat.No.:anti-bl-1, InvivoGen, Toulouse, France) respectively. To grow the hiPSC A4 and hiPSC B4, surfaces were first coated with vitronectin (Cat.No.: A14700, ThermoFisher Scientific, MA, USA) for one hour. The hiPSC A4 and hiPSC B4 were grown in iPSC Brew Basal medium (Cat.No.: 130-107-086, Miltenyi Biotec) supplemented with iPSC-Brew 50x (Cat.No.: 130-107-087, Miltenyi Biotec). The hESC H1 was grown in mTeSR™(Cat.No.: 85850, STEMCELL Technologies, CA)on Matrigel(Cat.No.:354277, corning, USA)-coated plates. All hESC and hiPS cells were grown till they started forming colonies before performing experiments. To study human replication dynamics using live cell time-lapse microscopy, hTERT RPE1 and hiPSC A4 cells were transfected with the plasmid pENeGFPCNAL2mut (pc0653, https://www.addgene.org/167564/) 102. The hTERT RPE1 cells were transfected with the AMAXA Nucleofector II system (Lonza, Cologne, Germany) using a self-made buffer (5 mM KCl, 15 mM MgCl2, 120 mM Na2HPO4/NaH2PO4 pH 7.2, 50 mM Mannitol) with the program A024 and seeded on a polymer coverslip bottom µ-slide 8 well plate (Cat.No.: 80826, Ibidi, WI, USA). The hiPSC A4 cells were first seeded till they form colonies on vitronectin-coated polymer coverslip bottom µ-slide 8 well plate transfected with Lipofectamine™ Stem Transfection Reagent (Cat.No.: L300015, ThermoFisher Scientific, MA, USA) using the manufacturer’s recommended protocol. The mESC J1, mESC J1 msTALE GFP cells were grown on gelatin-coated plates Dulbecco’s modified Eagle’s medium (DMEM) high glucose (Cat. No.: D6429, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) supplemented with 15% fetal calf serum (FCS), 1× non-essential amino acids (Cat. No.:M7145, Sigma-Aldrich Chemie GmbH, Steinheim, Germany), 1×penicillin/streptomycin (Pen/Strep) (Cat. No.:P4333, Sigma-Aldrich Chemie GmbH, Steinheim, Germany),1× L-glutamine (Cat. No.: G7513, Sigma-Aldrich Chemie GmbH, Steinheim, Germany), 34 Table 3: List of all the model cell lines used and their properties Name Species Type Ploidy Gender Reference hESC H1 Homo sapiens Embryonic Diploid Male 136 hiPSC A4 Homo sapiens iPSC from human neonatal foreskin fibroblast (HFF1) Diploid Male 137 hiPSC B4 Homo sapiens iPSC from human neonatal foreskin fibroblast (HFF1) Diploid Male 137 hTERT RPE1 Homo sapiens hTERT immortalized retinal pigment epithelial cell Diploid Female 138 BJ-5ta Homo sapiens hTERT immortalized foreskin fibroblasts Diploid Male 138 HeLa Kyoto Homo sapiens cervical cancer cell derivative Quasi tetraploid Female 139 HeLa Kyoto GFP mPCNA Homo sapiens HeLa Kyoto cell line stably expressing GFP PCNA Quasi tetraploid Female 140 HeLa Kyoto mCherry PCNA Homo sapiens HeLa Kyoto cell line stably expressing mCherry PCNA Quasi tetraploid Female 140 C2C12 Mus musculus Immortalized mouse myoblast Quasi tetraploid Female 141 C2C12 mRFP PCNA Mus musculus C2C12 stably expressing PCNA tagged with mRFP Quasi tetraploid Female 107 MEF w8 Mus musculus Mouse embryonic fibroblasts Diploid Male 142 mESC J1 Mus musculus Embryonic stem cells isolated from the inner cell mass of a mouse Diploid Male 143 mESC J1 msTALE Mus musculus mESC J1 stably expressing a dTALE protein targeting the major satellite (msTALE) DNA fused with GFP Diploid Male 144 0.1 mM beta-mercaptoethanol (Cat. No.: 4227, Carl Roth, Karlsruhe, Germany), 1000 U/ml recombinant mouse LIF (Millipore) and 2i (1 M PD032591 and 3 M CHIR99021 (Cat. Nos.: 1408 and 1386 respectively, Axon Medchem, Netherlands)) on gelatin-coated culture dishes (0.2% gelatin; Cat. No.:Sigma-Aldrich Chemie GmbH, Steinheim, Germany). The culture medium was changed every day, and cells were split every 2 days. To target the chromocenters to the lamin, mESC J1 msTALE GFP cells were transfected with GBP-Lamin B1 (pc1467), an expression vector encoding the 35 sequence of the GFP-binding VHH domain fused to the human Lamin B1 coding sequence 145. As a control for the targeting assay, the GFP-binding VHH domain was removed to establish an expression vector with human Lamin B1 alone (pc2809) 107. To capture the dynamics of the earliest origins, multiple 100 mm plates of individual cell lines were washed with PBS 1x, and 6 ml of media was added to each plate before being shaken for 2 minutes on a shaker. The cells should be cultured for at least 24 hrs before starting the experiment. The media was pooled from all plates and centrifuged for 5 minutes at 1500 rpm. The extra medium was discarded, and mitotic cells were resuspended in the rest of the media before being seeded. For live cell imaging, mitotic cells of HeLa Kyoto GFP PCNA and C2C12 mRFP PCNA were collected and seeded on a high-precision glass bottom plate (self-made). 2.2 Doubling time and (sub)S phase duration: For doubling time/cell cycle length quantification, two time points falling within the logarithmic phase of cell proliferation (cell confluency between 30 and 70%) were used. First, 1 × 105 hTERT RPE1, hiPSC A4, hiPSC B4, and hESC H1 cells were seeded as technical triplicates. The counting started once the cells became adherent to and started forming colonies. Cells were trypsinized and resuspended in 1X PBS. Cell numbers were counted with a Neubauer hemocytometer for multiple time points within a 24-hour interval. Doubling time (𝑑𝑡) of the cell culture was then calculated by 𝑑𝑡=(𝑙𝑜𝑔2×Δ𝑡)÷(𝑙𝑜𝑔𝑁2−𝑙𝑜𝑔𝑁1), where N1 and N2 are the numbers of cells counted at time point 1 and 2, respectively, and ∆t is the duration between these two time points. To determine the percentage of cells in the S phase, asynchronously growing cell populations were pulse-labeled with 10 µM of nucleoside analog 5-ethynyl-2’-deoxyuridine (EdU) (Cat.No.: 7845.1, ClickIT-EdU cell proliferation assay, Carl Roth, Karlsruhe, Germany) for 15 min, fixed, and was detected along with 4’,6-diamidino-2-phenylindole (DAPI) as described below. High-throughput images were acquired and analyzed as described below. Based on EdU and DAPI intensity, the cell cycle profile was plotted, and the fraction of cells in each cell cycle was determined. To determine the duration of the S phase, the fraction of cells in the S phase was multiplied by cell cycle duration. To determine the percentage of cells in each sub-S phase, the number of cells in each S phase was manually counted by scoring the EdU spatial patterns from images acquired using high throughput microscopy with a 40x objective. The fraction of cells in each S phase was multiplied by doubling time duration to calculate the duration of each S phase stage. 36 2.3 Genome replication labeling, visualization, and immunostaining A list of all the nucleotide/nucleoside analogs used is described in Table 4. Pulse labeling: The cells were seeded on sterilized coverslips with respective media for the replication labeling and visualization experiments. The cells were pulse-labeled with 10 µM of EdU for 15 min before washing with PBS 1× and fixing with 3.7% formaldehyde in PBS 1× for 10 min. Table 4: The list of nucleotide and nucleoside analogs Name Application Detection Catalog Company 5-ethynyl-2′-deo xyuridine (EdU) Labeling of nascent DNA in pulse-chase experiments ClickIT chemistry 7845.1 Carl Roth, Karlsruhe, Germany 5-TAMRA-Azide Detection of EdU - CLK-FA008-1 Jena Bioscience, Jena, Germany 5-bromo-2′-deox yuridine (BrdU) Labeling of nascent DNA in pulse-chase experiments Antibody detection B5002 Sigma-Aldrich Chemie GmbH, Taufkirchen, Germany Biotin-16-dUTP Labeling of FISH probes Streptavidin 11093070910 Roche Diagnostics Deutschland GmbH, Mannheim, Germany Cy3-dUTP Labeling of FISH probes - ENZ-42501 Enzo Life Sciences, Lörrach, Germany Thymidine Labeling of nascent DNA in pulse-chase experiments added only in the chase period - T1895 Sigma-Aldrich Chemie GmbH, Taufkirchen, Germany Pulse-chase–pulse-chase: Cells were seeded on sterilized coverslips with respective media. First, cells were incubated with 10 µM of EdU for 15 min (first pulse). The cells were washed twice with respective warm media supplemented with 50 µM of thymidine (Cat.No.: T1895, Sigma-Aldrich Chemie GmbH, Taufkirchen, Germany) to stop the incorporation of EdU before incubating with fresh media for another three hours. Cells were then incubated with 10 µM of 5-bromo-2′-deoxyuridine (BrdU) (Cat.No.: B5002, Sigma-Aldrich Chemie GmbH, Taufkirchen, Germany) for 15 min (second pulse). The cells were washed twice with warm media supplemented with 50 µM of thymidine and incubated in fresh media for another three hours. The cells were washed with PBS 1× before fixing 37 with 3.7% formaldehyde in PBS 1× at room temperature for 10 min. After fixation, the cells were washed thrice with PBS 1×. For a simpler pulse-chase, cells were fixed as above after the first pulse and 2.5/3 hrs of chase instead of adding BrdU. Pulse-chase/pulse-pulse for earliest origins: Multiple 100 mm plates where cells were grown for at least 24 hrs and around 60% confluent were used for mitotic shake-off. The old media was removed, and the cells were first washed with warm 1xPBS before adding 6 ml of media. The plates were shaken on a shaker for 2 min. The media were pooled from all the plates and centrifuged for 5 min at 1500 rpm. The extra medium was discarded, and the mitotic cells were resuspended in the rest of the media before being seeded for 8-9 hrs. The pulse was performed by adding EdU (10 µM) for 10 min. Meanwhile, media supplemented with 100 µM thymidine was prepared and placed in a warm bath. After the first pulse, the media containing EdU was removed and washed thrice with media supplemented with 100 µM thymidine. The cells were further incubated for 12 min with media supplemented with 100 µM thymidine. Care was taken to perform the washing as fast as possible. After the chase of 12 min, the cells were washed once with ice-cold PBS 1× before fixing with 3.7% formaldehyde in PBS 1× at room temperature for 10 min. For a pulse pulse, the thymidine was replaced with BrdU. To capture single cells with the earliest origins for further sequencing, mitotic cells were collected and seeded for 8-9 hrs in the respective media. The pulse chase was performed as above. Immediately after the chase period of 12 min, the cells were washed with prewarmed PBS/EDTA solution, and 2 ml of trypsin was added to the plat and incubated at 37°C for 2 min. To the plate 2 ml of warm media was added to deactivate the trypsin, and cells were resuspended to have a single cell suspension. The 4 ml of solution was transferred to a 50 ml tube, and 40 ml of ice-cold PBS 1× was added and vortexed. The cells were centrifuged for 3 min at 1500 rpm at 4°C. The cells were first resuspended in 19 ml of PBS 1×, and 1 ml of 37% formaldehyde was added immediately. The tube was transferred to a rotor and cells were fixed at room temperature for 10 min. The tube was centrifuged and the cells were resuspended in 40 ml of PBS 1×. The cells were pelleted and resuspended in 4% BSA/ PBS 1×, and stored in -20°C. Immunofluorescence staining: Unless otherwise mentioned, all the immunostaining was performed inside a dark, humidified chamber at room temperature. Table 5 lists all the antibodies used. 38 Table 5: List of the antibodies used for immunostaining Reactivity Host Clonality Diluti on Catalog Company Anti-PCNA Mouse Monoclonal 1:200 ab29 Abcam, Cambridge, UK Anti-RPA 194 Mouse Monoclonal 1:200 sc-48385 Santa Cruz Biotechnology, Dallas, TX, USA Anti-BrdU Rabbit Polyclonal 1:400 600-401- C29 Rockland Immunochemicals, Pottstown, PA, USA Anti- H3K9me3 Mouse Monoclonal 1:200 39285 Active Motif, Waterloo, Belgium Anti- H3K36me3 Rabbit Polyclonal 1:200 0 ab9050 Abcam, Cambridge, UK Anti- H3K27me3 Mouse Monoclonal 1:200 61017 Thermo Fisher Scientific, Waltham, MA, USA Anti-H3K9ac Rabbit Polyclonal 1:200 39917 Active Motif, Waterloo, Belgium Anti- H3K4me3 Rabbit Polyclonal 1:200 39159 Active Motif, Waterloo, Belgium Anti- H4K20me3 Rabbit Polyclonal 1:500 ab9053 Abcam, Cambridge, UK Anti- H3K9me2 Mouse Monoclonal 1:500 39683 Active Motif, Waterloo, Belgium Anti- H4K5ac Rabbit Polyclonal 1:500 ab51997 Abcam, Cambridge, UK Anti- H4K8ac Rabbit Polyclonal 1:200 ab15823 Abcam, Cambridge, UK Anti- H4K12ac Mouse Monoclonal 1:200 61527 Active Motif, Waterloo, Belgium Anti- H4K16ac Rabbit Polyclonal 1:200 39168 Active Motif, Waterloo, Belgium Anti- H3K56ac Rabbit Monoclonal 1:250 2134-1 Epitomics, Inc., Burlingame, California (Now Abcam) Anti-NSD1 Antibody, clone Mouse Ascites 1:250 04-1565 Sigma-Aldrich Chemie GmbH, Steinheim, Germany 39 1NW-1A10 Anti-SETD2 Rabbit Monoclonal 1:250 E4W8Q Cell Signaling Technology, Inc., MA, USA Anti- RNA polymerase II RPB1 phospho S5 Mouse Monoclonal 1:200 ab5408 Abcam, Cambridge, UK Anti- RNA polymerase II RPB1 phospho S2 Rat Monoclonal 1:500 61084 Active Motif, Waterloo, Belgium Anti-mouse IgG Alexa Fluor 488 Goat Polyclonal 1:500 A11029 Thermo Fisher Scientific, Waltham, MA, USA Anti-mouse IgG Cy5 Donkey Polyclonal 1:200 JIM-715- 175-150 Jackson ImmunoResearch Europe Ltd., Cambridge, UK Anti-rabbit IgG Alexa Fluor 488 Goat Polyclonal 1:500 A-11034 Thermo Fisher Scientific, Waltham, MA, USA Streptavidin Alexa Fluor 488 Conjugated - 1:500 S11223 Thermo Fisher Scientific, Waltham, MA, USA Streptavidin Cy5 Conjugated - 1:500 PA45001 Amersham Biosciences, Amersham, UK After fixation, the cells were permeabilized with 0.5% Triton X-100 (Carl Roth, Karlsruhe, Germany) in PBS 1× for 10 min, followed by three washes with 0.05% Tween in PBS 1×. To give access to the PCNA epitope, the cells were incubated with ice-cold methanol for 10 min. The cells were again washed thrice with a washing buffer (0.05% Tween in PBS 1×) and blocked with 4% BSA in PBS 1× for 30 min. For the detection of EdU, cells were incubated in Click-IT cocktail mix of 100 mM Tris-HCl pH 8.5, 10 mM CuSO4, 1 µM 647 Azide (Cat.No.: 259P.1, Carl Roth, Karlsruhe, Germany), and 100 mM ascorbic acid diluted in water for 30 min 146. Cells were washed thrice with 0.05% Tween in PBS 1×. To detect BrdU, cells were incubated in anti-BrdU primary antibody diluted in 2% BSA, 1× DNase I buffer (60 mM Tris/HCl pH 8.1, 0.66 mM MgCl2, 1 mM β-mercaptoethanol), and 0.1 U/mL DNase I (Cat.No.: D5025, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) for one hour at 37 °C. For the inactivation of DNase I, cells were washed twice with EDTA PBS 1× for 10 min each. For PCNA 40 detection, methanol treated cells were incubated in the primary antibody for two hours and washed thrice with 0.05% Tween in PBS 1× before adding suitable secondary antibodies for one hour and washing. To detect the histone modifications, cells were blocked in the blocking buffer (4% BSA/1% fish skin gelatin/PBS 1×) and incubated in the respective primary antibodies overnight at 4 °C. The cells were washed five times and incubated in suitable secondary antibodies for one hour before washing five times with the washing buffer. EdU was detected after the histone modification detection using Click-IT chemistry with TAMRA-Azide. All the cells were stained with 10 mg/mL DAPI (4′,6-diamidino-2-phenylindole, Cat.No.: D9542, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) for 10 min and mounted on Vectashield (Cat.No.: VEC-H-1000, Vector Laboratories Inc., Burlingame, CA, USA). All the coverslips were sealed with transparent nail polish and air-dried. Immunostaining in suspension for earliest origins: To capture individual cells in suspension, the frozen cells were thawed and pelleted. Here after resuspension in 2% BSA/PBS 1× the cells were transferred to a 1.5 ml tube. The EdU Click-IT staining was performed with Click-IT reaction cocktail as above but with 1% Saponin (Catalog No.:47036, Sigma-Aldrich Chemie GmbH, Steinheim, Germany). The cells were washed twice in suspension with 2% BSA/PBS 1×. After the final wash, the cells were pelleted, supernatant was removed, and resuspended in the leftover solution. To this 100% MeOH (-20 °C) was added slowly till 1 ml with tapping in between each drop. The cells were vortexed briefly, and placed on ice for 10 min with inverting the tube in between. The cells were pelleted with 2000 rpm for 5 min at 4 °C, supernatant was removed, and blocked in 2% BSA/PBS 1× for 30 min on a rotor. The tube was wrapped with aluminum foil. After 30 min, appropriate amount of primary antibody against PCNA was added to the tube, and kept on the rotor for 2 hrs. The cells were washed thrice in 2% BSA/PBS 1×, and again resuspended in appropriate secondary antibody and placed on the rotor for 1 hr. The cells were washed thrice and resuspended in 2% BSA/PBS 1×. 2.4 Probe generation, metaphase spread, repli-FISH, and immuno repli-FISH Probe generation: The probe generation, fluorescence in situ hybridization, and co-detection of replication foci (RFi) and FISH probes experiments were performed as described before 147. All the plasmids and primers used are summarized in Table 6. For the genomic DNA (gDNA) preparation, hTERT RPE1 was pelleted and incubated overnight in TNES buffer (10 mM Tris; pH 7.5, 400 mM NaCl, 10 mM EDTA, 0.6% SDS) supplemented with 1 mg/mL Proteinase K (Cat. No.:BS202505, Bio&sell GmbH, Feucht, Germany) at 50 °C. RNA was removed by the addition of 0.5 mg/mL RNase A (Cat.No.: 10109169001, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) for 30 min at 37 °C. The gDNA was extracted by the addition of 6 M NaCl at a final concentration of 1.25 M and vigorous 41 Table 6: A list of the probes and preparation methods Target Labeling Method Primers/Plasmids Reference Alu PCR AluF: 5′-GGATTACAGGYRTGAGCCA-3′ AluR: 3′-RCCAYTGCACTCCAGCCTG-5′ 148 Centromere PCR α27: 5′-CATCACAAAGAAGTTTCTGAGAATGCTTC-3′ α30: 5′-TGCATTCAACTCACAGAGTTGAACCTTCC-3′ 21 LINE1 Nick translation Plasmid pLRE3-eGFP 149 rDNA (human) Nick translation Plasmid pUC-hrDNA-12.0 150 rDNA (mouse) Nick translation pMr974 151 MaSat (Mouse) PCR 5’-AAAATGAGAAACATCCACTTG-3’ 5’-CCATGATTTTCAGTTTTCTT-3’ shaking. After centrifugation (15 min, 11,000× g, RT), gDNA was precipitated from the supernatant by the addition of 100% ice-cold ethanol followed by incubation at –20 °C for 1 h and subsequent centrifugation (10 min, 11,000× g, 4 °C). The pellet was washed with 70% ethanol, air-dried, and dissolved in double distilled water. The plasmids containing rDNA and LINE1 probes were labeled with Cy3-dUTP (Cat.No.: ENZ-42501, Enzo Life Sciences, Lörrach, Germany) using nick translation. To prepare the Alu and centromere probes, the purified gDNA from hTERT RPE1 was used as a template to amplify and label with biotin-16-dUTP (Cat.No.: 11093070910, Roche Diagnostics Deutschland GmbH, Mannheim, Germany) via PCR using specific Alu primers (5′-GGATTACAGGYRTGAGCCA-3′; 3′-RCCAYTGCACTCCAGCCTG-5′) as well as specific centromere primers (α27: 5′-CATCACAAAGAAGTTTCTGAGAATGCTTC-3′); (α30: 5′-TGCATTCAACTCACAGAGTTGAACCTTCC-3′) (refer to Figure 6A). Optionally, probes were sheared with a Covaris S220 (Covaris Inc., Woburn, MA, USA) in microTUBEs (50 µL aliquots; 520,045, Covaris Inc.) to a final size of ~ 500 bp when the size distribution of the labeled probes was above 2 kb. All probes (~100 ng) except rDNA were precipitated with 1 µg of fish sperm DNA (Cat.No.: 10223638103, Roche Diagnostics Deutschland GmbH, Mannheim, Germany), 0.13× NaAC, 42 and 2.5× ethanol, before being washed with 70% ethanol, air dried, and dissolved in the hybridization solution (50% Formamide/SSC 2×). Around 100 ng of rDNA was co-precipitated with human 1 µg of Cot-1 DNA (Cat.No.: 5190-3393, Agilent, Santa Clara, CA, USA), 1 µg of fish sperm DNA, 0.13× NaAC, and 2.5× ethanol to reduce non-specific signals. Metaphase spreads were used to validate the probes. The hTERT RPE1 / BJ-5ta cells were seeded for at least 24 h before being treated with 0.1 µg/mL colcemid (N-deacetyl-N-methylcolchicine, Cat.No.: 10295892001, Roche Diagnostics Deutschland GmbH, Mannheim, Germany) for three to four hours. Cells were then harvested by trypsinization and incubated for 30 min with 75 mM KCl at 37 °C with tapping in between. They were then fixed dropwise by adding ice-cold methanol/acetic acid (3:1) for 30 min on ice, and this was repeated twice. For chromosome spread, the cell suspension was dropped onto an ice-cold wet microscopy slide from a height of approximately 20 cm. The slide was then air-dried overnight. For metaphase FISH, the slides were rehydrated in ddH2O for 10 min, digested with 0.005% pepsin (165 U/mL, Cat.No.: P6887, Sigma-Aldrich Chemie GmbH, Steinheim, Germany) in 0.01 M HCl for 10 min at 37 °C, fixed with 2% formaldehyde for 5 min, washed twice with SSC 2×, dehydrated in 70%, 80%, and 100% ethanol for 3 min each and air dried. After equilibrating with 10 µL of hybridization solution containing respective probes for 30 min at 37 °C inside a sealed hybridization box, the metaphase spreads were co-denatured at 80 °C for 5 min and immediately covered in ice for another 5 min. The box was transferred to a humidified chamber (37 °C) and left overnight. Post-hybridization washes were performed with SSC 2× and blocked with 2% BSA/SSC 2× for 30 min. Biotin-labeled probes were detected with a suitable streptavidin-conjugated fluorophore, counterstained with DAPI, and mounted on Vectashield. The FISH signals were validated on metaphase spreads (Figure 6B) Repli-FISH and immuno-FISH: Cells were treated with 10 µM of EdU for 15 min, washed twice with PBS 1×, and fixed with 3.7% formaldehyde in PBS 1×. The cells were permeabilized with 0.5% Triton X-100 in PBS 1×, washed, and incubated in 20% glycerol in PBS 1× overnight at 4 °C. The cells were snap-frozen in liquid nitrogen and ice-cold incubated for 2 min with 20% glycerol PBS 1×, and this step was repeated two more times. This was followed by RNAse treatment (0.1 mg/ml) for 1 hr at 37 °C and washing with washing buffer. If detection of EdU was performed, it was done before the FISH as above and fixed with 2% formaldehyde in PBS 1× for 10 min. The DNA was depurinated in ice-cold 0.1 N HCl/0.5% Triton X-100 for 5 min, the cells were washed with SSC 2× and fixed with 2% formaldehyde for 5 min before dehydration (70%, 80%, and 100% EtOH) and incubating with the probes. For Alu and LINE1 co-detection, equal volumes from each probe were pulled, mixed, and added to the coverslip, incubated for 15 min at 37 °C. In a water bath, the cells and probes were co-denatured at 80 °C for 5 min, immediately placed on ice for 5 min, transferred to the humidified 43 hybridization chamber at 37 °C, and le