2 012 Delineating epigenetic regulatory mechanisms of cell proliferation and di erentiation TESI DOCTORAL UPF/ Abul Islam Delineating Epigenetic Regulatory Mechanisms of Cell Proliferation and Differentiation Abul Islam TESI DOCTORAL UPF / ANY 2012 DIRECTOR DE LA TESI Director: Dr. Nuria Lopez Bigas, Department of Experimental and Health Sciences, Pompeu Fabra University, Spain Director: Dr. Elizaveta V. Benevolenskaya, Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, USA iii To My Family v Acknowledgements Thanks to Omnipotent for enabling me to execute a project, which seemed very grueling at times, especially towards the beginning. I would like to express my immense gratitude to all the people who have assisted me in solving numerous problems during the course of the assignment. By doing so, they have aided the completion of my endeavor. I am much indebted to my venerable supervisors Dr. Nuria Lopez-Bigas, Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Spain and Dr. Elizaveta V. Benevolenskaya, Assistant Professor, Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago (UIC), USA. I express my profound gratitude with deep pleasure for their continuous encouragement, valuable suggestion, supervision and cooperation to enable me to complete the research as well as my adaptation to new environments in new cites and countries. My sincere appreciation is extended to all my lab-mates and coworkers including Abel Gonzalez-Perez, Alba Jene-Sanz, Albert Mascarell, Alberto Santos, Andrea Grilli, Armand Gutierrez, Christian Perez Llamas, David Tamborero, Gunes Gundem, Jordi Deu-Pons, Michael Schroeder, Sonja Althammer, Sonja Haenzelmann, Sophia Derdak and Xavier Rafael-Palou for their skilled technical support and valuable advice at different stages of my project. Abel Gonzalez-Perez also helped me translating the abstract to Spanish language. I also express my heartfelt gratitude to Dr. Maxim Frolov, Associate Professor, Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, USA, and his lab members Brandon N. Nicolay, Mary Truscott, and Battuya Bayarmagnai for their support, collaboration, research cooperation and valuable suggestion. Similarly I extend my gratitude to my colleagues Michael L. Beshiri, William F. Richter, Dannielle C. DeWaal and Laura A. Jacobs in Dr. Benevolenskaya’s lab. Also my special thanks goes to William F. Richter for his generous help on proofreading the thesis. I gratefully acknowledge the technical advice and encouragement of Dr. Robert Castelo, group leader, Functional Genomics group, PRBB during the course of the investigation. My grateful thanks to Dr. Thomas L. Volkert and Dr. Summit Gupta, Whitehead Institute, MIT, USA for practical training on vi Illumina sequencing. I also show my gratitude to Ms. Natalia Raj (UPF) and Ms. Rekha Pandya (UIC) for their generous administrative support; and computer system administrators Alfons Gonzalez Pauner and Miguel A. Sanchez Gomez for their helpful technical support. I am thankful to AGAUR, the Catalan Government and the Human Frontier Science Program for the fellowship they provided to continue my doctoral research. Also, I am thankful to the University of Dhaka, Bangladesh for granting educational leave for higher studies. Finally, I am greatly indebted to my family members, friends and well-wishers for their help, encouragement and cooperation throughout my study period. vii Abstract Recent advances in high throughput technology have opened the door to systematic studies of epigenetic mechanisms. One of the key components in the regulation of the cell cycle and differentiation is the retinoblastoma protein (pRB), a component of the RB/E2F tumor suppressor pathway that is frequently deregulated in cancer. The RBP2/KDM5A histone demethylase was shown to interact with pRB and regulate pRB function during differentiation. However, how precisely differentiation is coupled with halted cell cycle progression and whether an epigenetic mechanism is involved remain unknown. In the present study, I analyzed gene expression levels of human histone methyltransferases (HMT) and demethylases (HDM), as well as their targets in human cancers; and focused on RB/KDM5A connection in control of cell cycle and differentiation. In particular, I used Drosophila as a model to describe a novel mechanism where the RB/E2F pathway interacts with the Hippo tumor suppressor pathway to synergistically control cell cycle exit upon differentiation. Studying the role of miR-11, I found that the inhibition of dE2F1-induced cell death is its highly specialized function. Furthermore, I studied the induction of differentiation and apoptosis as the consequences of KDM5A deletion in cells derived from Rb knockout mice. I concluded that during differentiation, KDM5A plays a critical role at the enhancers of cell type-specific genes and at the promoters of E2F targets; in cooperation with other repressor complexes, it silences cell cycle genes. I found that KDM5A binds to transcription start sites of the majority of genes with H3K4 methylation. These are highly expressed genes, involved in certain biological processes, and occupied by KDM5A in an isoform-specific manner. KDM5A plays a unique and non-redundant role in histone demethylation and its promoter binding pattern highly overlaps with the opposing enzyme, MLL1. Finally, I found that HMT and HDM enzymes exhibit a distinct co-expression pattern in different cancer types, and this determines the level of expression of their target genes. viii Resumen Los avances recientes en las tecnolog´as de alto flujo han abierto el camino ı a los estudios sistem´ ticos de los mecanismos epigen´ ticos. La prote´na a e ı retinoblastoma (pRB), uno de los elementos de la ruta de supresi´ n de tumores o RB/E2F que se encuentra desregulado con frecuencia en el c´ ncer, es uno de los a componentes esenciales de la regulaci´ n del ciclo celular y la diferenciaci´ n. o o Sin embargo, a´ n no se conoce de qu´ manera precisa la diferenciaci´ n se u e o acopla a la detenci´ n del avance del ciclo celular y si hay alg´ n mecanismo o u epigen´ tico vinculado a este proceso. En este estudio, he analizado los niveles de e expresi´ n de histona metiltransferasas (HMT) y desmetilasas humanas (HDM), o as´ como sus dianas en c´ nceres humanos, y me he centrado en la conexi´ n de ı a o RB/KDM5A en el control del ciclo celular y la diferenciaci´ n. Espec´ficamente, o ı utilic´ Drosophila como modelo para describir un mecanismo nuevo mediante e el cual RB/E2F interact´ a con la ruta Hippo de supresi´ n de tumores para u o controlar de manera sin´ rgica la detenci´ n del ciclo celular relacionada con e o la diferenciaci´ n. Mediante la investigaci´ n del papel de miR-11, determin´ o o e que su funci´ n altamente especializada es la inhibici´ n de la muerte celular o o inducida por dE2F1. Adem´ s, estudi´ la inducci´ n de la diferenciaci´ n y la a e o o apoptosis como consecuencia de la p´ rdida de KDMA5 en c´ lulas obtenidas a e e partir de ratones sin Rb. Extraje como conclusi´ n que, durante la diferenciaci´ n, o o KDMA5 desempe˜ a un papel esencial sobre los estimuladores de los genes n espec´ficos de los tipos celulares, as´ como en los promotores de las dianas ı ı de E2F; en cooperaci´ n con otros complejos represores silencia a los genes del o ciclo celular. Investigu´ el mecanismo de reclutamiento de KDM5A y encontr´ e e que se une al sitio de inicio de la transcripci´ n de la mayor´a de los genes que o ı poseen metilaci´ n en H3K4. Estos genes tienen elevados niveles de expresi´ n, o o est´ n involucrados en determinados procesos biol´ gicos y est´ n ocupados por a o a ´ diferentes isoformas de KDM5A. KDM5A desempe˜ a un papel unico y no n redundante en la desmetilaci´ n de las histonas y que en gran medida se solapa o con la enzima con la funci´ n opuesta, MLL1. Para terminar, encontr´ que las o e enzimas HMT y HDM muestran patrones de co-expresi´ n distintos en diferentes o tipos de c´ ncer, y que este hecho determina los niveles de expresi´ n de sus genes a o diana. ix Preface The ‘Human Genome Project’ was carried out with the hope that revealing our genetic makeup might provide explanations for disease causing mechanisms and solutions to complex diseases like cancer. Recently, we have learned that DNA is not everything. “Genes are absolutely not our fate. They can give us useful information about the increased risk of a disease, but in most cases they will not determine the actual cause of the disease, or the actual incidence of somebody getting it” (according to the pioneer researcher Craig Venter after the human genome was sequenced). Genetic processes are controlled by another coding system called the “epigenome”. Unlike DNA, its code is flexible and reversible. It can respond to signals from the environment and change its pattern accordingly, and subsequently control gene expression. “The difference between genetics and epigenetics can probably be compared to the difference between writing and reading a book. Once a book is written, the text (the genes or DNA: stored information) will be the same in all the copies distributed to the interested audience. However, each individual reader of a given book may interpret the story slightly differently, with varying emotions and projections as they continue to unfold the chapters. In a very similar manner,epigenetics would allow different interpretations of a fixed template (the book or genetic code) and result in different read-outs, dependent upon the variable conditions under which this template is interrogated” (Dr. Thomas Jenuwein,director of the Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany). Conrad Waddington is often given credit for coining the term epigenetics in 1942 as “the branch of biology which studies the causal interactions between genes and their products, which bring the phenotype into being”. However, Epigenetics appears in the literature as far back as the mid 19th century. A textbook definition of epigenetics is the study of heritable changes in gene activity or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence. Examples of such changes are DNA methylation and histone modification, mediated by a set of enzymes that leave various sqmarks on various positions of histones. The particular landscape of these ‘marks’ involves particular cellular processes including cell cycle and differentiation, but essentially it switches genes on or off, or leaves them in a poised state until another signal comes. RBP2/KDM5A is one such histone-modifying enzyme whose activity can cause gene suppression by removing methyl groups from histone H3 on Lysine 4 (H3K4) residues. KDM5A is located in the retinoblastoma (RB) pathway – a critical pathway for cell cycle regulation. The retinoblastoma tumor x susceptibility gene product (pRB), was discovered 25 years ago. It functions as a master regulator of the cell cycle.and is deregulated in most cancer types due to its role as a tumor suppressor through governing the activity of the E2F transcription factors. It is now known that pRB directly interacts with KDM5A, promoting differentiation by inhibiting the activity of KDM5A. Deregulation of this interaction leads to cancer by hampering normal differentiation and accelerating proliferation through activation of other transcription factors. However, in a proliferative state,the differentiation program remains inactive. This compendium leads to an obvious question: How does KDM5A participate in repression of developmental genes during cell proliferation? Additionally, with the onset of developmental signalling; How is KDM5A repression relieved from these genes. In Drosophila, the pRB homolog, RBF1, had previously been shown to be important for terminal cell cycle exit. The Hippo tumor suppressor pathway in humans is relatively newly described and thought to play a role in development. The Hippo pathway was originally identified in flies, and is similar to the pRB/E2F network. Like the pRB pathway, inactivation of the Hippo pathway has been found in human tumors. However, how the pRB/E2F dependent genes are epigenetically regulated, especially by KDM5A, during cell cycle and differentiation is not well understood. Also, the molecular mechanisms underlying recruitment of the histone modifying enzymes, and the identity of the genome-wide epigenetically regulated gene modules are largely unknown. Moreover, since cellular processes work in a complex circuit with one another, a possible interaction between the pRB tumor suppressor pathway and other equivalent tumor suppressor pathways has yet to be explored in detail. Since conventional anti-tumor chemotherapeutic agents targeting genetic factors remain largely unsuccessful in curing patients with most cancer types, and involvement of epigenetic factors as both causative and drug resistant agents becomes more and more appreciated, focus on combating cancer is gradually shifting towards epigenetic drug discovery, especially the search for inhibitors of histone modifying enzymes. Before finding suitable epigenetic targets, it is important to know detailed mechanisms of recruitment, target gene groups and their expression patterns, as well as functional consequences of inhibiting these enzymes. Epigenetic research uses a wide range of molecular biology techniques including genome-wide studies which require detailed bioinformatics analysis in order to extract biologically meaningful information. Therefore, the field of computational epigenetics is becoming more and more popular. The important role of epigenetic defects in cancer opens up new opportunities for improved diagnosis and therapy. These active areas of research reveal several opportunities for bioinformatic analysis. Firstly, given a list of genomic locations exhibiting epigenetic differences between tumor cells and xi normal cells, can we detect co-regulated patterns or find evidence of a functional relationship of these regions to cancer? Secondly, is it possible to do a functional classification with the afforementioned sets of genes and by relating this to other experimental data, such as expression data, can we identify active gene modules that would be affected by drugs targeted to the epigenetic effectors? Thirdly, can we integrate several sets of epigenetic modification data to better explain the mechanisms of cell growth regulation in cancer? Fourth, can we use bioinformatic methods in order to find suitable drug targets and improve diagnosis and therapy by detecting and clustering important disease subtypes? In this dissertation, I use both bioinformatic and experimental biological techniques; and high throughput data both available publicly and generated by our laboratory to address these questions appropriately and meet the challenges as well as to accumulate enough evidence in support of my hypotheses. The current research was conducted through collaboration between a bioinformatics unit in the Department of Experimental and Health Sciences of Pompeu Fabra University in Spain and a molecular biology laboratory in the Department of Biochemistary and Molecular Genetics at the University of Illinois at Chicago in the United States. For this study I used a variety of human cell lines and primary cell data, as well as mouse and fly models. In the first chapter, I discuss and review recent literature to provide a detailed overview of the biology of histone modifying enzymes, particularly enzymes responsible for adding or removing methyl groups from lysine residues of histone, with special attention to KDM5A. Also, we discuss the RB/E2F pathways and another tumor suppressor pathway named Hippo in human and fly. Later sections focus on detailed experimental, and bioinformatics approaches, algorithms, and tools for genome-wide high-throughput data analysis necessary for studying molecular genetic and epigenetic mechanisms. Out of this study I aim to unravel some novel cellular mechanisms. Also, I conducted for the first time, an integrated analysis of large sets of data to elucidate epigenetic patterns. The Results section consists of eight chapters, followed by a discussion of my findings and a concluding remarks chapter. Also, findings of three of my other collaborative projects during the course of doctoral study, which have already been published, are presented in the appendix. Contents List of figures List of tables xviii xix I 1 Introduction Introduction 1.1 Histone Modification . . . . . . . . . . . . . . . . . . . . . . . 1.2 Structural Properties of Histone Methylating and Demethylating Enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 SET domain containing Histome Methyltransferases . . 1.2.2 Amine Oxidase Family HDMs . . . . . . . . . . . . . . 1.2.3 Jumonji domain protein family . . . . . . . . . . . . . . 1.3 Epigenetic Regulation of Transcription by HMTs and HDMs . . 1.4 Epigenetic regulator KDM5A . . . . . . . . . . . . . . . . . . . 1.4.1 Transcriptional regulation of KDM5A through protein-protein interactions . . . . . . . . . . . . . . . . 1.4.2 Biological Function of KDM5A and related proteins . . 1.5 Epigenetic regulators in cancer . . . . . . . . . . . . . . . . . . 1.5.1 Aberrant DNA methylation . . . . . . . . . . . . . . . . 1.5.2 Hostone onco-modification . . . . . . . . . . . . . . . . 1.5.3 Epigenetic regulation of microRNA and cancer link . . . 1.5.4 Epigenetic drugs for cancer treatment . . . . . . . . . . 1.6 Conserved pRB and Hippo tumor suppressor pathways in humans and flies: from cell cycle control, development to tumorigenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 pRB-E2F pathway in cell cycle control . . . . . . . . . 1.7 Bioinformatics in genetic and epigenetic research: Extracting biological information from data . . . . . . . . . . . . . . . . . 1.7.1 DNA microarray . . . . . . . . . . . . . . . . . . . . . xiii 1 3 5 9 10 14 16 21 22 22 26 29 30 31 37 38 41 42 45 46 xiv 1.7.2 1.7.3 1.7.4 1.7.5 1.7.6 1.7.7 1.7.8 CONTENTS ChIP-on-chip . . . . . . . . . . . . . . . . . . . . . . . ChIP-seq . . . . . . . . . . . . . . . . . . . . . . . . . RNA-seq . . . . . . . . . . . . . . . . . . . . . . . . . Discovery and search of regulatory motifs . . . . . . . . Enrichment analysis . . . . . . . . . . . . . . . . . . . Databases and resources . . . . . . . . . . . . . . . . . Translating computational research to biology: Integrative analysis of genetic and epigenetic data . . . . 56 59 76 85 87 90 90 II III 2 3 4 Objectives Results Selective targeting of histone methylation Coregulation of histone modifying enzymes in cancer 95 99 101 117 Genome-wide Analysis using ChIP to Identify Isoform-specific Gene Targets 133 Overlapping and distinct recruitment of enzymes involved in histone methylation determines epigenetic pattern in human genome 141 Cooperation between dE2F1 and Yki/Sd defines a distinct transcriptional program necessary to bypass cell cycle exit 183 mir-11 limits the proapoptotic function of its host gene, dE2f1 213 5 6 7 8 Demethylation by KDM5A Contributes to Repression of Cell Cycle Genes during Differentiation 235 KDM5A regulates differentiation through two distinct mechanisms on cell cycle and tissue-specific genes 259 9 IV Discussion . . . . . . . . . . . . elucidates . . . . . . . . a . 285 287 . . . . . 289 . . . . . 291 broader . . . . . 292 10 Discussion 10.1 Background . . . . . . . . . . . . . . . . 10.2 Epigenetically regulated gene module . . 10.2.1 Mode of HMT/HDM recruitment picture . . . . . . . . . . . . . . CONTENTS xv 10.2.2 A distinct co-expression and coregulation pattern of HMTs/HDMs exists in multiple cancer types . . . . . . 292 10.2.3 Utilization of the ChIP-seq technique in combination with other bioinformatic approaches yields insightful knowledge from genome-wide data . . . . . . . . . . . 293 10.2.4 Relative requirement of KDM5A and other enzymes for H3K4 methylation . . . . . . . . . . . . . . . . . . . . 293 10.3 Functions of the pRB/E2F tumor suppressor pathway and its link to epigenetic regulation . . . . . . . . . . . . . . . . . . . . . . 294 10.3.1 The RB/E2F pathway cooperates with another critical tumor suppressor pathway in growth control . . . . . . . 295 10.3.2 RB/E2F pathway function is controlled by miRNA . . . 296 10.4 Epigenetic factors in the regulation of pRB/E2F-dependent genes 297 10.4.1 KDM5A regulates RB/E2F targets during differentiation 297 10.4.2 During differentiation KDM5A binds TSS of cell cycle genes and enhancer of developmental genes . . . . . . . 298 10.5 Implication of our studies for general knowledge and human health300 V VI Conclusion Appendix 303 307 11 IntOGen: integration and data mining of multidimensional oncogenomic data 311 12 Whole genome analysis of p38 SAPK-mediated gene expression upon stress 315 13 Biological reprogramming in acquired resistance to endocrine therapy of breast cancer 333 VII Bibliography 351 List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 Chromatin Structure: . . . . . . . . . . . . . . . . . . . . . . . Readers Determine the Effects of Lysine Methylation . . . . . . Illustration of the histone trimethyl-lysine interactome proteins . Domain structure of SET family HMTs. . . . . . . . . . . . . . Structure and phylogenetic tree of human SET domain-containing HMTs . . . . . . . . . . . . . . . . . . . . . Schematic representation of MLL1 domains. . . . . . . . . . . . Model of methylation by MLL1. . . . . . . . . . . . . . . . . . Schematic architecture of the Polycomb complexes PRC1 and PRC2, and associated proteins. . . . . . . . . . . . . . . . . . . Schematic Diagrams of LSD1 Domains, 3D Structures, and Mechanism of Action . . . . . . . . . . . . . . . . . . . . . . . Phylogenetic tree of the JmjC family of demethylases . . . . . . Chemical mechanism of the demethylation reaction mediated by JmjC family HDMs . . . . . . . . . . . . . . . . . . . . . . . . Schematic presentation of the JARID1 domain structure in humans and its orthologs . . . . . . . . . . . . . . . . . . . . . Histone modifications at bivalent genes. . . . . . . . . . . . . . Model of Lid demethylase mediated gene activation and repression in connection to dMyc. . . . . . . . . . . . . . . . . Model of KDM5A mediated gene suppression in Notch targets. . Model of RBP2/KDM5A, Sin3 interaction in differentiated muscle cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coordinate control of differentiation and proliferation in the pRB pathway. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA methylation patterns in normal and cancer cells. . . . . . . Histone modification patterns in normal and cancer cells . . . . Deregulation of Histone-modifying enzymes in cancer. . . . . . Aberrant chromatin modifications at leukemia-inducing genes. . Mammalian and Drosophila melanogaster retinoblastoma and E2F family proteins . . . . . . . . . . . . . . . . . . . . . . . . xvii 5 6 8 10 12 13 13 15 17 18 19 20 22 24 25 26 30 32 33 33 35 44 xviii 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 LIST OF FIGURES 45 47 50 52 58 61 63 66 67 69 71 72 74 77 79 80 82 91 Hippo pathway in Drosophila melanogaster and mammals . . . Model of methods used in cDNA microarray technology . . . . Normalization strategies for Affymetrix GeneChip data . . . . . Classification of DNA microarray normalization methods . . . . An enrichment signal around a binding site . . . . . . . . . . . Overview of ChIP-seq experiment . . . . . . . . . . . . . . . . Typical ChIP-seq data analysis pipeline . . . . . . . . . . . . . Algorithmic approaches for aligning short sequencing reads . . . Effect of various mapping options on the alignment of short reads Steps in peak calling . . . . . . . . . . . . . . . . . . . . . . . Strand-specific profiles at enriched sites . . . . . . . . . . . . . Determination of fragment length by the cross-correlation method Determination of the depth of sequencing . . . . . . . . . . . . Overview of RNA-seq experimental steps . . . . . . . . . . . . Overview of the RNA-seq analysis pipeline for detecting differential gene expression . . . . . . . . . . . . . . . . . . . . 1.38 Mapping strategy of RNA-seq spliced reads . . . . . . . . . . . 1.39 The TopHat pipeline . . . . . . . . . . . . . . . . . . . . . . . 1.40 Enrichment analysis steps in Gitools . . . . . . . . . . . . . . . 10.1 Target genes regulation in the RB/E2F pathway. . . . . . . . . . 290 List of Tables 1.1 1.2 1.3 1.4 1.5 “Reader” domains. . . . . . . . . . . . . . . . . . . . . . . . . Developmental expression pattern and mutant/knockdown phenotype of the KDM5 family demethylases . . . . . . . . . . Epigenetic microRNAs in cancer targeting chomatin modifying enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods for Affymetrix GeneChip data analysis . . . . . . . . . Public databases of genome-wide experiments . . . . . . . . . . 7 28 39 50 92 xix List of abbreviations ALL Alpha-KG AML ANLL AR CLL DLBCL DNMT EA HAT HATi HDAC HDACi HDM HDMi HMT HMTi Jmj JoVE KDM KMT MLL MSD NR PcG PCR PRC TF TSS TTS Acute lymphoblastic leukemia Alpha-Ketoglutarate Acute myeloid leukemia Acute non-lymphoblastic leukemia Androgen receptor Chronic lymphocytic leukemia Diffuse large B-cell lymphoma DNA methyl transferase Enrichment Analysis Histone acetyl transferase Histone acetyl transferase inhibitor Histone deacetylase Histone deacetylase inhibitor Histone demethylase Histone demethylase inhibitor Histone methylatransferase Histone methylatransferase inhibitor jumonji Journal of visualized experiments Lysine demethylase Lysine methyl transferases Mixed lineage leukemia Myelodysplastic syndrom Nuclear receptor Polycomb group Polymerase chain reaction Polycomb repressive complex Transcription factor Transcription start site Transcription termination site xxi Part I Introduction 1 Chapter 1 INTRODUCTION 3 Eukaryotic DNA is wound around a core of proteins forming a tightly packed complex called chromatin. The basic unit of chromatin is the nucleosome representing ∼147 base pairs of DNA wrapped around a histone core that consists of two copies each of histones H2A, H2B, H3 and H4 (Figure 1.1). Figure 1.1: Chromatin Structure: Chromatin is made of repeating units of nucleosomes, which consist of ∼147 base pairs of DNA wrapped around a histone octamer containing of two copies each of the core histones H2A, H2B, H3 and H4. Linker histone H1 is positioned on top of the nucleosome core particules stabilizing higher order chromatin structure. The histones are subject to a wide variety of post-translational modifications, primarily on their N-terminal tails, but also in their globular core region. (adapted from [1]) Each of the histones can be covalently modified in various ways, including lysine (K) methylation, arginine (R) methylation, lysine acetylation, serine (S) and threonine (T) phosphorylation, ubiquitination, ADP ribosylation, propionylation, butyrylation, formylation, sumolyation, citrullination, proline isomerization, and the very recently described lysine crotonylation (Kcr) [2]. All together it includes at least 12 different types of modifications over 60 different amino acid residues on histones. Specific enzymes play a role in each modification. These includes various classes of enzymes such as histone deacetylases (HDACs), histone acetyltransferases (HATs), kinases, deiminases, methyltransferases (HMTs), demethylases (HDMs), ubiquitinases, deubiquitinases, SUMO ligases, proteases, sirtuins etc [3]. Posttranslational modification of histone tails regulates the interaction of the nucleosome with other nucleosomes and with linker DNA and directs the folding of chromatin into a more complex structure. This epigenetic modification facilitates the recognition and binding of other non-histone effector proteins (“readers”) which determines the final outcome of histone modification events (Figure 1.2). Binding of these “reader” proteins occur through various protein domains 5 1. I NTRODUCTION 1.1 Histone Modification Figure 1.2: Readers Determine the Effects of Lysine Methylation The schematic depicts three different models for how methyllysine readers can sense and transduce histone methylation marks into biological outcomes. (A) A specific reader recognizes one mark and links it to one specific outcome. In this way, distinct readers link different marks to alternate functional outcomes. (B) Five distinct readers all specifically recognize the same mark for example trimethylated histone H3 lysine 4 (H3K4me3), resulting in five alternate biological outputs being linked to the one mark.(C) One reader binds to multiple different marks (such as H3K9me3, H3K27me3, and H4K20me3), linking the three different marks to a single biological output. (adapted from [4]) 6 Table 1.1: “Reader” domains. Specific domains of the “reader” protein that recognizes particular histone modification. K:lysine, R:arginine, S:serine, T:threonine, P:phosphate, me:methyl. ac:acetyl. (adapted from www.activemotif.com) Genomw-wide determination of “readers” of trimethyl-lisine (me3) modifications on histones was made possible only recently by te development of sophisticated technology [6]. These experiments identified a large number of readers that mediate activation, repression or elongation function (Figure 1.3). As genome-wide data of multiple histone modifications has become available, statistical analysis of these data has shown that in all studied species the number of chromatin ‘states’ is very limited. More than a hundred chromatin ‘marks’ or modifications have been identified to-date which makes a large number of theoretical combinations of chromatin states, but in reality there are only a few which makes it more complicated to understand the fine tuning of regulatory mechanisms and to figure out which combination of marks encode biologically distinct chromatin states. For example, one study [7] in Arabidopsis thaliana surveyed 11 histone marks and DNA methylation and found only 7 1. I NTRODUCTION [5] including 14-3-3, Ankryn repeats (ANK), Baculovirus IAP repeat (BIR), BRCA1 C-terminal domain (BRCT), bromodomain (BRD), chromodomain (CHR), MBT domain, phosphotyrosine interaction domain (PID), PHD zinc finger, tudor (TRD), WD40 repeats summarized in Table 1.1. Figure 1.3: Illustration of the histone trimethyl-lysine interactome proteins. Proteins interacting with the five trimethyl lysine marks are indicated. Using pull-down assays followed by ChIP-seq, genomic locations of histone lysine trimethylation were determined. Chromatin “readers” are assigned to complexes by interaction proteomics of full-length BAC-GFP-tagged proteins. Quantitative mass spectrometry was used to identify the readers. Encircled are proteins that were additionally identified by GFP pulldown experiments; baits in these pulldowns were underlined. Proteins in the circle marked with an asterisk were not identified as interactors in the peptide pulldowns. Proteins clustered in rectangles were in the peptide pulldowns. (adapted from [6]) 8 four major chromatin states by heat-map and hierarchical clustering. Although theoretically some 4000 combinations are possible, actually only 38 occur frequently and they correspond to the four main chromatin states with distinct biological functional properties. Also, another study [8] in human lymphocytes identified 51 states using 38 histone marks which were grouped into 5 functional classes: active intergenic, large-scale repressed, promoter-associated, repetitive and transcription-associated states. Similarly many other studies confirmed that with the advent of technology it is possible to identify a large number of modifications and cominatorial pattern genome-wide, but it is more important to delineate the biological significance. 1.2 Structural Properties of Histone Methylating and Demethylating Enzymes The N-terminal tails of histone subunits are located outside of the relatively compact octameric histone core (Figure 1.1), which makes them readily accessible to a variety of histone modifying enzymes. Enzymes, which post-translationally modify histones through the attachment of methyl group/s are knows as methyltransfereases. This specific modifications are then recognized by another specific protein(s). Depending on the type of modification and the ‘reader’ protein, specific regulatory and functional and changes occur. Again, this mono-, di- or try-methyl group can be erased by another set of enzymes and delineating a distinct signal which leading to the activation of other regulatory and functional consequences. As multiple modifications can be combined, each combination results in a heterogeneous identity for a single nucleosome. Each combination is believed to dictate a particular biological output, thus providing a code. This concept is commonly referred to as the “histone code” [9]. Histone methylation can cause both trascriptional activation and repression and hence abbrent methylation pattern are implicated in various diseases including cancer. Until recently, before the discovery of first demehylase LSD1 in 2004, it was believed that methylation was irreversible due to the high thermodynamic stability of the N-CH3 bond [10]. About 50 different methyltransferases have been discovered so far. A comparable number of demethylases have also been found and more are likely to be discovered [11]. Depending on the enzyme, mono-, di-, or tri-methyl groups can be placed and removed by HMTs and HDMs respectively. 9 1. I NTRODUCTION 1.2.1 SET domain containing Histome Methyltransferases Histone lysine methyltransferases and histone-arginine N-methyltransferases are enzymes transfer methyl groups to lysine and arginine residues in histone tail. Except for Dot1 [12, 13], all HMTs possess a conserved SET (Su(var)3-9, Enhancer of Zeste, Trithorax) domain. They are so named because SET was first identified as an overlapping sequence with three proteins of Drosophila: suppressor of variegation [Su(var)3-9], enhancer of zeste [E(z)], and homeobox gene regulator trithorax [Trx] [14]. Consequently, they can act as transcriptional activators or repressors [15, 16, 17]. SET domain is ∼130 amino acids long, and is comprised of two conserved non-contiguous portion: SET-N and SET-C; and SET-I which is a variable length insert region between them. Between SET-N and SET-C, the latter contains the majority of the invariant residues [18]. This variability in SET-I length and the lack of a consistent pattern of sequence homology across the different members of the SET family may confer substrate specificity [18]. Since a distinct length SET-I subdomain with identical substrate specificity has been reported, it is not SET-I alone that determines substrate specificity; adjacent cysteine-rich regions (pre-SET and post-SET) are also required for its enzymatic activity [19] (Figure 1.4). However, they are as conserved as the SET domain. The crystal structure of the SET domin form an unusual knot-like structure composed of the folding of a series of variable length β-strands together with a short α-helix and several loops [19, 20]. S-adenosyl-L-methionine (AdoMet) has been reported to act as a cofactor, binding opposite faces of the SET domain. SET family HMTs can transfer mono-, di- or tri-methyl residues. Several studies have proposed that the geometry and shape of the lysine access channel of the ternary complex is mainly responsible for determining the number of methyl group to be transferred to its lysine substrate [21, 22]. However, in order to transfer transfer the methyl residue, the lysine side-chain of the histone tail must be deprotonated. Figure 1.4: Domain structure of SET family HMTs. The conserved SET motif and its flanking domains. Based on SET protein sequence alignment conservation, four strongly conserved motifs have been found (adapted from [22]) . There are at least 50 SET domain-containing proteins in the human genome which have or are predicted to have methyltransferases function. Based on the 48 human SET domain containing proteins, a phylogenetic tree and domain 10 KMT2A/MLL1 The Mixed Lineage Leukemia (MLL) group of proteins are members of the SET1 subfamily. MLL1, a multi-modular protein of 3969 amino acid residues cantains, addition to the conserved SET domain: several PHD fingers and a bromodomain (BD) that binds histone; a cystine-rich CXXC domain and protein-protein interaction domains; DNA-binding AT hooks; a transactivation domain (TAD); a nuclear receptor-interaction motif (NR box); and a WDR5-interacting Win motif (Figure 1.6(a)). MLL1 is often found in complex with ASH2L, WDR5, DPY-30, and RbBP5 (Figure 1.6(b)) [24]. It has been reported that enzymatic activity increased by ∼20 fold when these additional components are bound. Also, recent study showed that interaction between WDR5 andRbBP5 is important for the ativation of MLL1 methyltransferase activity [25]. The CXXC domin that is specific for unmethylated CpG might play role in the selective targeting of active genes [26, 27]. However, some other domains like TAD also contribute to the recognition of target genes. One such mechanism is through the binding of the transcriptional co-activator CBP [28, 29]. A classical mechanism of methylation delineates that interaction of the Win motif and WDR5 facilitates the assembly of the MLL1 core complex with sub-complexes [WDR5-RbBP5-Ash2L] and then WDR5 binds Arg2 of histone H3 and facilitates H3 methylation by presenting the Lys 4 side chain to the SET domain of MLL1. However, recently it has been shown that either MLL1 in absence of functional sub-complex or a WDR5 sub-complex with a catalytically inactive SET domain alone is able to monomethylate the substrate, which indicates use of a distinct catalytic site for methyl group transfer [30, 31]. But in presence of both functional complexes, catalytic activity increased ∼600 fold and together they can dimethylate H3. This phenomena leads to the proposal of a new model of MLL1 methyl transfer activity, where lysine multi-methylation is a sequential process and the addition of two methyl groups is carried out at two different catalytic sites within the same complex (Figure 1.7) [30]. KMT6/EZH2 and Polycomb Group Silencing by H3K27me3 is mediated through the highly conserved Polycomb group complex (PcG). Trimethylation of H3K27 usually results in gene 11 1. I NTRODUCTION structure of the SET family protein HDMs is shown in Figure 1.5 [23]. Based on the SET domain sequence homology and cysteine-rich regions in the vicinity, SET family HMTs can be grouped into six subfamilies, although not all SET domain proteins are included in this classification. These subfamilies are: SET1, SET2, SUV39, SMYD, EZH2 and PRDM. Figure 1.5: Structure and phylogenetic tree of human SET domain-containing HMTs. Phylogenetic tree of SET domain proteins based on sequence similarities and distances of SET domains. SET: Suppressor of variegation, Enhancer of Zeste, Trithorax domain; pre/post-SET: cysteine-rich motifs found adjacent to a subset of SET domains; PHD: plant homeodomain zinc finger; ANK: ankyrin repeats; AT hook: A/T DNA binding motif; C2H2 Znf: C2H2-type zinc finger; HMG: high mobility group; SANT: SWI3, ADA2, N-CoR and TFIIIB DNA-binding domain; CxxC: CxxC zinc finger; MBD: methyl CpG binding domain. (adapted from [23]) 12 Figure 1.6: Schematic representation of MLL1 domains. (a) Diagram showing the full length MLL1 protein (upper) with a detailed component of the C-terminal SET domain (bottom) (adapted from [32]). (b) Interaction of the MLL1 complex with other proteins. Schematic diagram showing MLL1 interacting with other proteins through Win motif near the C-terminal SET domain. (adapted from [24]) Figure 1.7: Model of methylation by MLL1. (A) The MLL1 core complex, composed of two distinct H3K4 methyltransferases, which each possess their own H3K4 monomethylation activity. The sub-complex WDR5-RbBP5-Ash2L-DPY30 (WRAD) has catalytic activity involving a motif, indicated by a dashed line, which may be shared between subunits. (B). MLL1 Win is recognized by WDR5, causing assembly of the MLL1 core complex, which has H3K4 dimethyltransferase activity. The proposed model suggests that the MLL1 SET domain catalyzes monomethylation of histone H3 at Lys 4, followed by transfer of the monomethylated histone H3 to a second active site on the WRAD sub-complex, where H3K4 dimethylation occurs. (adapted from [32]) 13 1. I NTRODUCTION repression and chromatin silencing. In particular, silencing through PcG has been linked to repression of homeotic genes and X-chromosome. The enzyme responsible for this catalysis is KMT6 or EZH2 (E(Z) in Drosophila), a member of the SET domain HMT family, which is part of PcG. PcG consists of three complexes: Polycomb repressive complex I (PRC1), Polycomb repressive complex 2 (PRC2) and RhoRC. PRC2 methylates unmethylated H3K27 and PRC1 reads-out these changes. PRC1 could recruit PRC2 through the CBX protein. However, both PRC1 and PRC2 can work synergistically [33]. PRC2 is further composed of 4 core components: EZH2, SUZ12, EED (two WD40 domain proteins) and RbAp46/48 [34]. However, in addition to a core subunit, there are several other proteins and polypeptides known to form a complex with PRC2 (Figure 1.8) including: (1) AEBP2 – a zinc finger protein, which interacts with mammalian PRC2 and augments EZH2 enzymatic activity. (2) Polycomb-like proteins PCL1, PCL2 and PCL3, which all have PHD fingers, a tudor domain, a carboxy-terminal domain and a PCL extended domain, and interact with PRC2 mainly through EZH2, SUZ12 and RbAp46/48 [35]. (3) JARID2 – a JmjC domain protein that can interact with PRC2 [34, 36]. Although JARID2 does not have catalytic activity of its own, it has been shown that it can either augment or reduce the enzymatic activity of PRC2. It has also been reported that EZH2 preference dinucleosomes over mononucleosome. Linker histone H1 when present in nucleosome enhances EZH2 enzymatic activity, probably through interaction of H1 and EZH2 [37]. In Drosophila two proteins, PHO-repressive complex (PhoRC) and Polycomb repressive deubiquitinase (PR-DUB) have been characterized as PcG complexes [38, 39]. It is likely that RhoRC and other DNA binding proteins in Drosophila recruits PRC1 and PRC2 upon binding to DNA [40]. 1.2.2 Amine Oxidase Family HDMs LSD1/KDM1 The amine oxidase family demethylase LSD1 is a lysine-specific flavin containing amine oxidase with methylated H3K4 as its substrate. It mainly function as trascriptional corepressor [10], however it was also identified as part of an activating complex [42]. Structurally it has three characteristic domains: an N-termainal SWIRM domain, a central Tower domain and a C-terminal amine oxidase like (AOL) or amine oxidase domain (AOD). The AOD contains the active site of the enzyme, comprised of 50 invarient residues with two parts: one binds the FAD cofactor and the other binds the peptide substrate. The SWIRM and AOD are packed closely each other. The -N-methylamine moiety of the histone substrate H3K4 is wrapped with FAD azine ring to form the enzyme-substrate complex, while H3 interacts with the anionic pocket of 14 Figure 1.8: Schematic architecture of the Polycomb complexes PRC1 and PRC2, and associated proteins. (a) Component of PRC2 and PRC1 has shown. In the case of PRC1, the left panel diagram corresponds to the classical PRC1 complexes, whereas the right shows the so-called PRC1-like complexes. Due to sequence homology with the Drosophila PSC protein, it is thought that the BMI1-, MEL18- and NSPC1-containing PRC1 complexes might be involved in chromatin compactation. The ‘pocket’ shape of the CBX proteins denotes the chromodomain that specifically recognizes H3K9/27me3. HPH1, 2 and 3 represents human polyhomeotic homologues 1, 2 and 3. X, Y and Z denote various proteins including SCMH1/2, FBXL10, E2F6 and JARID1D which might contribute to the formation of PRC1-like complexes, however, their precise composition is currently unknown. (b) Characterized domains are indicated for each PRC2 component with the indication of predicted function. CXC, cysteine-rich domain; ncRBD, non-coding-RNA-binding domain; SANT, SWI3, ADA2, N-CoR and TFIIIB DNA-binding domain; SET, Su(var)3-9, enhancer of zeste, trithorax domain; VEFS, conserved among VRN2-EMF2-FIS2-SU(Z)12; WD40, short ∼40 amino acid motifs. (adapted from [41]) 15 1. I NTRODUCTION the active site. The Tower domain is able to bind the corepressor CoREST [34, 43, 44, 45]. Interaction between CoREST and LSD1 allows access to the nucleosomal substrate and prevents proteosomal degradation. Among the two SANT domains, SANT2 alone can confer enzymatic activity and this domain wrapped around the protruding Tower domain of LSD1. Along with CoREST, LSD1 activity is also mediated through other partners like the androgen receptor (AR) and BHC80 [46] (Figure 1.9) which enables LSD1 to demethylate H3K9. 1.2.3 Jumonji domain protein family Soon after the discovery of LSD1, another class of histone demethylase which contain a Jumonji (JmjC)-domain with Fe(II)-dioxygenase activity was predicted by Trewick and co-workers [47] and finally discovered by Zhang and his colleagues. The first JmjC-domain-containing histone demethylase discovered was JHDM1A/KDM2A. This enzyme erases di- and monomethyl groups from H3K9 and H3K36 resulting in a non-methylated form [48, 3]. However, its paralogs JMJ2B and JMJ2D show specificity only for H3K9me2/3. Immediately afterwards, another JmjC family protein JHDM2A/JMJD1A/KDM3A was reported to have H3K9me2 demethylase activity [49]. Additionally, JHDM3A/JMJD2A/KDM4A protein was recently shown to be able to demethylate H3K9me3 and H3K36me3 to dimethylated form [50]. These discoveries were then followed by identification of a large number of JmjC-domain containing HDMs. To-date over 30 JmjC domain-containing proteins have been identified in mammals. A phylogenetic tree of the JmjC family shows seven subfamilies: (1) JHDM1, (2) JHDM2, (3) JHDM3/JMJD2, (4) JARID, (5) PHF2/PHF8, (6) UTX/UTY, and (7) JmjC-domain only is represented in Figure 1.10 [51, 11]. It is believed that additional domains, other than JmjC, participate in substrate recognition, specificity and protein-protein interaction [53, 54]. These auxiliary functional domains include PHD, ARID or Bright, Tudor, F-box, TPR and Zn+2 finger [51, 55]. ARID binds a specific DNA motif in KDM5A; PHD and Tudor have been shown to recognize unmodified or methylated residues in histone tails [56]. Demethylation activity by this family of enzymes is performed through an oxidative reaction. The enzymatic reaction requires Fe(II) and α-ketoglutarate (α-KG) as co-factor. In first step, molecular O2 converts α-KG to succinate and CO2 which concomitantly hydroxylates the methyl group of the substrate and produces the unstable Carbinolamine intermediates through a free radical reaction (Figure 1.11). The intermediate product spontaneously degrades and becomes demethylated, releasing formaldehyde [47, 50, 57]. Unlike amine oxidase family HDMs (e.g. LSD1) the JmjC family HDMs do not require protonated nitrogen to be present in the substrate; they can demethylate any 16 Figure 1.9: Schematic Diagrams of LSD1 Domains, 3D Structures, and Mechanism of Action LSD1 domains are indicated by different colors. AOD stands for amine oxidase domain. LSD1 alone demethylates H3K4me1/Me2. CoREST interacts with the Tower/insert region of LSD1 (upper interaction). This interaction results in nucleosomal demethylation. The human androgen receptor (AR) has also been shown to interact with LSD1 and result in H3K9me1 and me2 demethylation (adapted from [11]). 17 1. I NTRODUCTION Figure 1.10: Phylogenetic tree of the JmjC family of demethylases The names, synonyms, substrate specificities, and domain structures of the proteins are provided. Putative oncoproteins are in red and putative tumor suppressors are in green. (JmjC) Jumonji C domain; (JmjN) Jumonji N domain; (PHD) plant homeodomain; (Tdr) Tudor domain; (Arid) AT-rich interacting domain; (Fbox) F-box domain; (C5HC2) C5CHC2 zinc-finger domain; (CXXC) CXXC zinc-finger domain; (TPR) tetratricopeptide domain; (LRR) leucine-rich repeat domain; (TCZ) treble-clef zinc-finger domain; (PLAc) cytoplasmic phospholipase A2 catalytic subunit. (adapted from [52]) 18 Figure 1.11: Chemical mechanism of the demethylation reaction mediated by JmjC family HDMs. Using Fe(II) and αoketoglutarate the enzyme (JmjC-domain demethylases) converts the methyl group of the peptide substrate to a hydroxymethyl group. The reaction results an unstable carbinolamine intermediate which is then degraded to the unmethylated peptide with the release of formaldehyde (adapted from [58]) . methylated state including trimethyl residues. JARID1/KDM5 subfamily delethylases JARID (Jumonji/AT-rich interactive domain-containing protein) family HDMs have been found in various organisms including human (JARID1A/RBP2/ KDM5A, JARID1B/PLU1/KDM5B, JARID1C/SMCX/KDM5C, JARID1D/ SMCY/KDM5D) Drosophila (Lid), yeast (Jhd2p), worm (RBR-2, C. elegans). In mammals, besides the JmjC domain, they contain four other conserved domains: JmjN, AIRD/BRIGHT, PHD fingers and a C5CH2 zing finger (Figure 1.12). Both JmjC and JmjN are required for enzymatic activity. The ARID domain binds DNA [59], and the C-terminal PHD mediates protein-protein interaction by binding certain methylated residues and allowing other effector protein recruitment, such as HDACs and thus acts as “reader” and “eraser” [5, 60, 61]. The function of the C5CH2 domain is unclear. However, the C5CH2 domain of Lid strongly binds dMYC. Yeast Jhd2p is almost the half size of other JARID1 family members and contains only the JmjN/JmjC and first PHD domain; although its substrate preference remains similar to other members. The KDM5 family can demethylate H3K4me3/2 and is often found close to the transcription start site (TSS) of transcribed genes [62]. 19 1. I NTRODUCTION Figure 1.12: Schematic presentation of the JARID1 domain structure in humans and its orthologs. The first four proteins are human. Lid is found in Drosophila, RBR-2 is present in C. elegans and Jhd2p in yeast. (modified from [63]) 20 Regulation at the level of transcription requires either activation of the silenced genes or suppression of actively transcribed genes. A distinct set of epigenetic modifications are present in silent heterochromatin and active euchromatin. The functional consequences differ by the specific types and location of modifications. In general, acetylation, methylation, phosphorylation and ubiquitination are linked to trasncriptional activation, whereas, ubiquitination, sumoylation, deamination, methylation are liked to repression. However, exceptions to this scheme exists depending on the context of modification. For example, tri methylation of H3K4 is generally implicated in transcriptional activation, where as trimethylation of H3K27 generates a repressive signal. Acetylation of H3K27 marks activation. H3K9 or H3K36 methylation in the gene-body acts favourably for transcription, but acts negatively when it occurs at the promoter [64]. Also, the degree of methylation determines outcome, for example, monomethylations of H3K9, H3K27, H4K20, H3K79, and H2BK5 are implicated in gene activation, whereas trimethylations of H3K9, H3K27, and H3K79 are linked to repression [65]. One important phenomena in gene regulation is bivalent marking [66], which is defined as chromatin with the presence of both activating and repressing modifications within same area. This, for example, can occur through the opposing effects of distinct histone modifications by two members of the SET family HMTs. Genes can be marked for activation by MLL through transfer of a methyl group to H3K4, and consequently, the repressive mark H3K27me3 can be conferred by EZH2 as part of the PRC2 complex. This opposing signal can poise the gene for expression (Fig 1.13) [67]. Classically H3K4me3 is associated with active euchromatin and H3K27me3 with silent heterochromatin. However, enrichment of opposing modifications in the bivalent domain in ES cells results in a poised state of activation resulting in low expression of developmental transcription factors and thus maintains pluripotency. It has been observed that depletion of the H3K4me3 demethylase rbr-2 in C. elegans or Drosophila Lid1 results in an increase in H3K4me3 globally. Additionally, inhibition of UTX expression increased the H3K27me3 level in HeLa cells indicating that there is a dynamic constant turnover of methylation maintained by the cell. Therefore, the activation signal of H3K4 HMTs can be opposed by H3K4 HDMs, and similarly, H3K27 HMTs can be reversed by H3K27 HDMs. 21 1. I NTRODUCTION 1.3 Epigenetic Regulation of Transcription by HMTs and HDMs Figure 1.13: Histone modifications at bivalent genes. Bivalent genes are marked by both active and repressive epigenetic marks. MLL complexes delineate chromatin with an activating H3K4me3 mark. The consecutive action of PcG PRC2 and PRC1 complexes places repressive marks at H3K27 through trimethylation, which is thought to recruit PRC1 via the CBX family protein complex. PRC1 complexes. PRC1 complexes subsequently transfer a monoubiquitin residue to histone H2A (H2AK119). Both, PRC1 and monoubiquitinated histone H2A (H2AUb) lock chromatin in a silenced state and force RNA PolII to remain in a paused state (adapted from [68]). 1.4 1.4.1 Epigenetic regulator KDM5A Transcriptional regulation protein-protein interactions of KDM5A through Retinoblastoma binding protein 2 (RBP2), also known as JARID1A and, according to new nomenclature system, KDM5A, is the founding member of KDM5 family demethylases. The human gene for KDM5A is located in the short arm of chromosome 12 (12p33.33). It is the fourth RefSeq gene from the telomere, and was recently identified in a chromosomal translocation in an infant with acute myeloid leukemia. PLU1/JARID1B/KDM5B (1q32.1) is highly homologous to KDM5A (96% identity, 76% similarity). Two other KDM5 family members, KDM5C and KDM5D, are more homologus to each other than to KDM5A or KDM5B, and located on chromosomes X and Y respectively. KDM5A was originally isolated as a interactor of retinoblastoma-binding protein (pRB) [69] and was implicated in the regulation of pRB targets [70]. While KDM5A has a prominant role in pRB function, it can also interact with several other proteins including TATA-binding protein (TBP), p107 [71], Rhombotin-2 [72], nuclear receptor [73], Myc [74, 75, 76], Mad1 [76], RBP-j [77], Sin3/HDACs [78, 79], CLOCK-BMAL1 [80]. Here we discuss these 22 proteins first, and will focus on pRB in next Section. RB family. KDM5A binding to p107 occurs exclusively at the Large T/E1A ‘pocket’ of p107, however, it can interact with RB in pocket-independent manner, which indicates the presence of multiple interaction domains in KDM5A [71]. Rhombotin-2. Using a yeast two hybrid system, KDM5A was found to interact with a LIM domain protein Rhombotin-2 (RBTN2/LMO2), and modulated its function. Interaction between KDM5A and LMO2 increases transcriptional activity, providing an evidence of KDM5A’s function as a trascriptional coactivator. Using a reporter gene system, it was shown that the transcriptional activity of LMO2 can increase from three to six folds in different cell lines. Nuclear receptors. KDM5A binds to nuclear receptors (NRs) and accelerates the transcription of their targets [73]. Interaction of KDM5A and NRs occurs through DNA-binding of NRs and enhances NR-mediated transcription in an AF-2-dependent manner. Although both the C- and N-terminal regions of KDM5A are necessary for the interaction, the C-terminal end is the main interacting region. KDM5A interaction with NR suggests that it could recruit additional factors to facilitate cancer progression. Myc. At least three studies [74, 75, 76] have shown a connection between KDM5A and Myc protein that is often activated in cancer. Transcriptional regulation by KDM5A/Myc/Max network was first demonstrated by Secombe et al. [74, 75] in Drosophila and provided clues that this network might also exist in a mammalian system. This study illustrated regulatory mechanism of the Nop60B gene by KDM5A homolog Lid in association with dMyc (Figure 1.14). dMyc is a suppressor of the demethylases activity of Lid which is required to maintain the trimethylated status of dMyc target gene promoters. This trimethylation mark is most likely recognized by other chromatin remodelling complexes, such as NURF, and promotes transcriptional activation. In human, MAD1 can recruit KDM5A to repress telomerase reverse transcriptase (hTERT) promoter by virtue of its demethylating activity [76]. hTERT is a ribonucleoprotein that maintains telomere ends by addition of the telomere repeat sequence TTAGGG. In the absence of telomerase and TERT, DNA replication is hampered. MYC and MAX can form a dimer that binds to E-box in the promoter and recruits histone acetyl transferases (HATs) to activate its targets [81]. However, during differentiation of HL60 cells, Mad1 expression was shown to increase and Mad1 can bind to the same E-box. This antagonizes Myc function and recruitment of KDM5A, consequently repressing hTERT transcription. However, during oncogenesis, when Myc is activated, the KDM5A demethylase activity is suppressed. Notch signialing. It is known that when activated through ligand binding, RBP-j activates transcription of Notch target genes. In the absence of this 23 1. I NTRODUCTION Figure 1.14: Model of Lid demethylase mediated gene activation and repression in connection to dMyc. (Upper) Trimethylation of H3K4 of dMyc target Nop60 mark activation of this gene. The demethylases activity of Lid can oppose this. However, when Lid is complexed with dMyc, its demethylase activity is suppressed and the trimethylation status is maintained, which in turn could allow chromatin remodeling factors such as NURF to activate transcription. Alternatively, Lid may directly recruit or interact with the transcriptional initiation or elongation machinery. (Bottom) In absence of dMyc, possibly through recruitment by a transcription factor or due to the DNA-binding ability of the ARID domain, Lid may localize to this trimethylated region where its demethylase activity suppresses transcription. (adapted from [74]) ligand signal, RBP-j switches off the target genes. Recently Liefke et al. [77] showed involvement of KDM5A in this process. This study delineated that KDM5A physically interacts with the nuclear effector RBP-j and that this interaction is an integral, conserved component of the Notch/RBP-j gene silencing mechanism, required in Notch-induced growth response and tumorigenesis. H3K4 trimethylation was shown to be erased at Notch target genes, e.g. Deltex-1, Hes-1 and CD25 at the RBP-J-binding sites, and for preTα in part at the promoter (Figure 1.15). HDACs. Investigation of functional association of HDACs and KDM5A suggests that KDM5A interacts with MRG15 complex [79] and MRGX-containing complexes. MRG15, a conserved chromodomain protein associated with HDACs is involved in both activation and repression of transcription. It can also interact with pRB in a nucleoprotein complex. In the HeLa cell line, it was shown that MRG15 is required for KDM5A-induced H3K4 demethylation [79]. Recently interaction of KDM5A with the Sin3/Rpd3 24 Figure 1.15: Model of KDM5A mediated gene suppression in Notch targets. Activated by Notch signaling NotchICD migrates into the nucleus, associates with the transcription factor RBP-J, and recruits Mastermind (MAML) and other coactivators (CoA) including a histone acetyltransferase (HAT) which activates transcription and maintains the trimethylation status of H3K4. In absence of the activation signal of Notch, NotchICD is degraded and RBP-J recruits KDM5A, which demethylates H3K4me3. (adapted from [77]) corepressor complex and HDACs was demonstrated by Oevelen et al. [78]. The complex is recruited to the promoter region by another sequence specific repressor, such as Ume6 in yeast and Mad in mammals, and causes gene silencing of Sin3 targets through deacetylation of histones. A link between KDM5A demethylases activity and the Sin3/HDAC repressor complex was reported earlier [79, 82]. Interestingly, a significant overlap of KDM5A, Sin3 and E2F4 targets was observed at certain genes during differentiation. In addition, HDAC was also present at some of E2F4 targets. As the the result of these studies, it was proposed that during during S phase of the cell cycle Sin3 and E2F4 are dissociated from target genes leading to increase in H4 acetylation and H3K4 methylation. Moreover, upon initiation of differentiation, Sin3 as well as KDM5A are recruited to the same target causing demethylation of H3K4me3 and an increase of H3K27me3 (Figure 1.16). However, exact mechanism is still unknown. In E2F4 target regions this coordinated activity leads to deacetylation, demethylation and repositioning of nucleosomes, resulting in permanent silencing in terminally differentiated cells. CLOCK-BMAL1. Another example KDM5A-mediated increase of histone acetylation by inhibiting deacetylase was recently reported [80] in the circadian clock. KDM5A can form a complex with the CLOCK-BMAL1 transcription factor and is recruited to the PER2 promoter, augmenting transcription by 25 1. I NTRODUCTION Figure 1.16: Model of RBP2/KDM5A, Sin3 interaction in differentiated muscle cells. In the undifferentiated cell, E2F4 target genes remain acetylated, and trimethylated at H3K4. With the differentiation signal, these targets are silenced by demethylation at H3K4, deacetylation and with the repression mark H3K27 trimethylation. During this time Sin3 binds and form a complex with KDM5A and HDAC. It is unclear if there is direct interaction of this complex with E2F4 and its recruitment to chromatin. Me, methylation; methylation of H3K4 (green) and H3K27 (red); Ac, acetylation. (adapted from [78]) inhibiting histone deacetylase function in a demethylase-independent manner. Depletion of KDM5A reduces the expression of the canonical circadian genes, which suggests that KDM5A influence of the circadian clock is a non-redundant function. 1.4.2 Biological Function of KDM5A and related proteins KDM5A acts as a supressor by removing methyl groups from H3K4me3/2. H3K4me3 is observed in active euchromatin and hence implicated as a transcriptional activator. Trimethylation of H3K4 starts during assembly of the preinitiation complex, and is involved in transcriptional elongation by recruiting chromatin remodeling complexes and histone modifying enzymes or by preventing binding of transcriptional repressors. Association of KDM5A with H3K4me3/2, especially in close proximity to the promoter regions of transcriptionally active genes was reported earlier [62, 83]. KDM5A is specific for H3K4me3/2 and is able to demethylates this residue down to its unmentylated form, acting as a transcriptional repressor. Consistent with this function, when KDM5A is recruited to Hox gene promoters (Hoxa1, a5, a7) in mouse cells, their transcription is repressed. Also, when mouse ES cells were stimulated for to differentiation, KDM5A binding at some developmental genes decreased along with an increase of the H3K4 trimethylation mark [83]. Kdm5a−/− mice showed increased transcription of certain cytokine genes and an increase in H3K4me3 26 activity [84]. Overexpression studies demonstrated a reduction of H3K4me3 levels by KDM5A, KDM5B and KDM5C [83, 84, 85]. Interestingly, in contrast to this phenomena, Kdm5a−/− mice were observed to have a normal level of H3K4 methylation, leading to a claim of functional redundancy for KDM5A. For example, deletion of jhd2 results in only a minor changes in H3K4 methylation compared to its overexpression [86, 87] suggesting that a different HDM might play same role in the absence of KDM5A. Therefore, it would be interesting to know the genome-wide location and functions of KDM5A targets that undergo changes in H3K4 methylation in the presence or absence of KDM5A versus site of H3K4me3 that are not bound by KDM5A. The histone demethylation activity of the KDM5 family is conserved between humans and model organisms [63, 75, 83, 84, 85, 88, 89, 90]. These studies revealed that the members of the KDM5 family seem to play critical roles in cell fate determination and differentiation, which can partially be ascribed to their histone demethylation function [91]. H3K4 HMTs are an evolutionary conserved group of proteins originally called trithorax (trxG). trxG and PcG assemble into chromatin-modifying transcriptional regulatory supercomplexes regulating H3K4 and H3K27 methylation, respectively. There is evidence of genetic interaction between KDM5A orthologs with trxG genes in yeast and fly [85, 92] and the protein-protein interaction of KDM5D protein and the Polycomb-like protein Ring6a/MBLR [90]. KDM5A is required to maintain the full repression of developmental genes in mouse ES cells through co-recruitment with the PRC2 complex [83, 93]. Besides gene inhibition through H3K4 demethylation, repression by KDM5A may proceed through recruitment of co-repressor complexes or through competition with co-activators [79, 90, 94]. KDM5A Role in Development KDM5A has been shown to play a major role in cell proliferation, survival, differentiation and senescence by regulating the transcription of specific genes. Although Kdm5a−/− is not lethal in mouse and progeny remain fertile, mutation in Lid is homozygous lethal in Drosophila [92]. Klose et al. [84] generated mice with deleted JmjC, C5HC5 and PHD domains of KDM5A which showed defects in the hematopoietic system; including overproduction of hematopoietic stem cells (HSC) and myeloid progenitor cells. Knockout mice showed decreased apoptosis and an increase in cells in the G1 phase of the cell cycle. Knockdown of KDM5A in the SAOS2 cell line resulted in upregulation of cell cycle regulating gene such as p21, p27 and p130 [84]. Also, mouse embryonic fibroblast homozygous null for Kdm5A showed increased expression of chemokines, such as SDF1, indicating that KDM5A binds to SDF1 promoter. Similarly knockdown of rbr-2 in C. elegans caused defects in vulval development [83], implicating KDM5A function in development. Zebrafish 27 1. I NTRODUCTION lacking KDM5D showed defects in neuronal development [88]. Rum1 mutants in corn fungus and jhd2 mutants in yeast resulted in impaired sporulation [85, 95]. Altogether this suggests a profound role of KDM5 demethylases in development. Lan et al. summarized the developmental expression pattern of KDM5 family of demethylases and mutant or knockdown phenotypes in a recent review (Table 1.2) [96]. Table 1.2: Developmental expression pattern and mutant/knockdown phenotype of KDM5 family demethylases. Top row: contain reported developmental expression patterns; middle row: adult/tissue-specific expression patterns; bottom row: mutant/knockdown phenotypes. (adapted from [96]) It has long been known that pRB plays a central role in cell cycle control by suppressing the function of the cell cycle-activator E2F protein family which are necessary for G1 to S-phase transitation. However, pRB can also interact with some transcription factors that regulate differentiation. In addition to E2F, pRB can cooperate with a variety of proteins and regulates the activity of histone modifying enzymes. pRB mutants lacking the ability to bind E2F were capable of initiating differentiation [97]. To demonstrate the role of KDM5A in development Benevolenskaya et al. [70] proposed a model in which pRB neutralizes KDM5A repression, promoting senescence and differentiation (Figure 1.17). Firstly, pRB/KDM5A complexes can be detected in specific chromatin fractions derived from differentiating cells. Secondly, both pRB and KDM5A function in several in vitro systems, including osteogenic, myogenic or adipocyte models of differentiation. Particularly, KDM5A depletion can induce non-myogenic cells to express muscle-specific genes and acquire phenotypic traits of muscle cells, or can induce 3T3 cells to undergo adipocyte differentiation in vitro. Cessation of cell growth through KDM5A siRNA correlated with an increased accumulation of p130, p21, and p27, and diminished levels of pRB, p107 and cyclin E. Third, KDM5A knockdown had an effect on transcriptional 28 activation mediated by GRα and CBFA1 in cell culture assays that might be considered as surrogates of differentiation [97]. In contrast to the model of KDM5A largely functioning as transcriptional repressor, maximal activation of certain other promoters (e.g., homeotic genes BRD2 and BRD8) requires the formation of a pRB/KDM5A complex [70]. The effect of KDM5A was shown to be direct because a single mutation in the KDM5A DNA-binding domain or mutations in one of the KDM5A consensus sequences in the BRD2 promoter abrogate the KDM5A-mediated transcriptional activation [56]. This suggests that transcriptional regulation by KDM5A can proceed through different mechanisms. It was shown that KDM5A is a critical regulator of cellular differentiation [70]. Knockdown of KDM5A alone in SAOS2 could block cell cycle, and was able to induce bone differentiation marker. pRB can elicit differentiation by co-activating Runx2 mediated differentiation markers [98]. In one system, knockdown of KDM5A induced differentiation through induction of RUNX2 suggesting that KDM5A suppresses transcription of this protein (Figure 1.17). Later work on KDM5A found that the correlation of changes in gene expression in KDM5A depleted cells and cells overexpressing pRB is highly significant [62]. The expression changes were observed in genes implicated in development and differentiation suggesting that KDM5A is likely to be downstream of pRB in the differentiation pathway. The function of the RB pathway in cell cycle progression and differentiation is conserved in plants and mammals [99]. Strikingly, high conservation of the KDM5A protein sequence can be traced down to lower eukaryotes. While the KDM5 proteins show highly conserved sequences, they appear to have different profiles of expression and different functions. Moreover, each Kdm5 gene encodes several alternatively spliced isoforms [91]. However, isoform-specific genome-wide targets and functions specific to each isoform are still not well known. 1.5 Epigenetic regulators in cancer Cancer is defined as uncontrolled growth of abnormal cells in the body. Cancer is a complex disease and a results of a variety of alterations, classically viewed as an accumulation of genetic mutations or other types of genetic aberrations. Eventually, it was discovered that cancer consists of genetic and epigenetic abnormalities. Epigenetic abnormalities play a crucial role in tumor development as well as several other diseases [101, 102]. Many types of epigenetic deregulation including DNA methylation, histone modification, microRNA expression, chromosome positioning/chromatin remodelling and mechanisms involved in these complex processes might contribute to neoplastic changes. Epigenetic events can even lead to traditionally known preliminary modifications, such as mutations in tumor suppressors and proto-oncogenes, 29 1. I NTRODUCTION Figure 1.17: Coordinate control of differentiation and proliferation in the pRB pathway. Model showing control of proliferation or differentiation by pRB in conjunction with PRB2/KDM5A, E2F and other proteins. The tumor suppressor pRb may act as a central decision maker in this process. pRb can block proliferation by binding and inactivating the E2F family of transcription factors. pRB binds RBP2, neutralizing its expression at cell type-specific genes. pRb may also play role in differentiation through different sets of proteins and transcription factors including targets of KDM5A which is downstream of pRB. In the absence of a fully-functional pRb, RBP2 silencing can block proliferation and induce a differentiation program. Moreover, the pRb/RBP2 complex enhances the expression of homeotic genes (BRD2 and BRD8). (adapted from [100]) genomic instability and chromosomal translocations etc. Deregulation and disruption of the epigenetc machinery through mutation, deletion, overexpression, global loss and local gain of DNA methylation, alteration in histone methylation and acetylation [103, 104, 105, 106] in a number of cancers has been characterized, providing evidence of epigenetic effectors in the process of oncogenesis. These “epimutations” might be the second ‘hit’ in ‘two-hit’ cancer initiation model due to their ability to silence the previously mutated tumor suppressor’s active allele [107]. Thus, the “cancer epigenome” is defined as the global changes in DNA methylation and histone modification patterns, as well as altered expression profiles of chromating-modyfying enzymes. These epigenetic changes cause dysregulation of gene expression leading to the development and progression of a disease state. 1.5.1 Aberrant DNA methylation DNA methylation has a pivotal role in transcriptional regulation, tissue specific gene expression, and chromatin architecture. DNA methylation by DNA methyl transferases (DNMTs) at the C-5 position occur in CpG sites-regions 30 of DNA where a cytosine nucleotide occurs next to a guanine nucleotide. In the genome, CpG-rich regions are known as CpG islands commonly found in the 5 regulatory region of many genes. In normal cells CpG islands are not usually methylated although some specific regions happen to be methylated [108, 109, 110] confering long-term gene silencing and eliciting genomic stability by suppressing repetitive DNA elements and transposons [111]. Repetitive DNA sequences are scattered throughout the genome and are usually highly methylated. In contrast to wild type cells, cancer cells show global hypomethylation, and site-specific CpG island hypermethylation in the promoter region of tumor suppressors (Figure 1.18) [112] which could be the result of DNMT overexpression or aberrant recruitment, affecting oncogene expression and genomic stability [113, 114, 115, 116]. Such loss of methylation in coding exons, introns and repeated DNA regions might result in an alternative version of mRNA expression. Aberrant hypomethylation in repetitive elements can be achieved by chromosomal rearrangement, retrotransposable elements, mitotic recombination and increased translocation. Also, demethylation, especially at the promoter can lead to expression of some protooncogene such as Hox11 in leukemia [117]. On the other hand, CpG island promoter hypermethylation might lead to the suppression of tumor suppressor gene activity, such as cyclin-dependent kinase inhibitor 2A (CDKN2A), mutL homolog-1 (MLH1), breast cancer-associated-1 (BRCA1) and von Hipple-Lindau (VHL), and play an important role in regulating cell cycle, differentiation, DNA repair, apoptosis and cell adhision [112, 113, 118, 119]. 1.5.2 Hostone onco-modification Post-translational modification of histone tails provides an epigenetic mechanism of control for a variety of cellular processes. Therefore, disruption of normal covalent histone modifications is implicated in several cancers (Figure 1.19). To date, more than 60 histone residues have been linked to cancer [120]. Acetylation and methylation are most often involved, particularly, loss of acetylation and methylation at specific residues in H3 and H4 have been implicated as markers of tumor cells [105, 121]. Global alteration in acetylation of H4K16, trimethylation of H4K20, acetylation/trimethylation of H3K9, trimethylation of H3K27, acetylation of H3K56, acetylation of H3K18, acetylations of H4K12 and importantly di and trimethylation of H3K4 are most often found in various human cancers. Also, Histone-modifying enzymes can undergo alterations in variety of cancers that add, remove or recognize specific modifications in specific types of cancer (Figure 1.20). 31 1. I NTRODUCTION Figure 1.18: DNA methylation patterns in normal and cancer cells. (a) CpG islands and CpG island shores usually remain unmethylated in normal cell permitting gene transcription (top). DNA methylation within the gene bodies avoids spurious transcription initiations. Unlike normal cells, in cancer cells (bottom) although both CpG islands and CpG island shores may be strongly methylated, gene bodies lack this modification. As a result, transcription of many genes is suppressed, and aberrant transcription may occur from incorrect transcription start sites (TSSs). (b) In normal cells (top), methylation of repetitive elements preclude genomic instability and, again, spurious transcription initiations. Also, transposable elements are unable to be activated in a methylated environment. In cancer cells (bottom), global hypomethylation provokes genomic instability and aberrant transcription initiations. Concomitant activation of transposons may lead to gene disruption. (adapted from [112]) 32 Figure 1.19: Histone modification patterns in normal and cancer cells. The figure depicts the main modifications of the four core histones in normal cells. Histones can undergo several post-translational modifications mainly in their N-terminal tails and their C-terminal regions. In the appropriate combination, these modifications contribute to establishing the global and local condensed/decondensed chromatin states that determine gene expression. Disruption of their normal patterns is related to cancer, histone modifications are typically associated with the disease. Ac, acetylation; Me, methylation; P, phosphorylation; Ub, ubiquitination. (adapted from [112]) Figure 1.20: Deregulation of Histone-modifying enzymes in cancer. Several histone-modifying enzymes have been shown to be deregulated in cancer cells. Enzymes for the respective histone onco-modifications are represented in green when reported to be upregulated or in red if found as downregulated in cancer cells. Fusion proteins found in cancer cells are delineated in black. (adapted from [1]) 33 1. I NTRODUCTION Histone methyltransferases in cancer Misegulation of tumor-suppressor and oncogenes by HMTs is implicated in a variety of cancer types. Abnormal expression or activity of HMTs owing to chromosomal translocations, amplification, deletion, overexpression or silencing has been linked to cancer initiation, progression as well as tumor aggressiveness. Aside from transcriptional regulation, HMTs also have a role in maintaining genomic integrity and cellular identity. Their alteration in normal activity would lead to inappropriate gene expression which could ultimately result in a change of cell fate and disease. The polycomb group transcriptional repressor complex PRC2 with EZH2 as the core component which demethylates H3K27me2/3 has been found overexpressed in variety of tumors [122] including colon, skin, lung [123] and ovarian cancer [124]. This overexpression has also been linked to breast cancer aggressiveness [125, 126], bladder tumor progression [127], markers for prostate cancer progression and metastasis [128, 129]. Recently, an inactivating mutation of EZH2 in folicular and diffuse large B-cell lymphoma (DLBCL) has been reported [60, 130]. Intragenic mutation in MLL3, PRDM1 and PRDM2 have been linked to colorectal, lymphoma and breast cancer respectively [23]. H3K4 HMT SMYD3 was found to be highly expressed in colorectal, breast and hepatocellular (HCC) carcinoma [131, 132]. The mixed lineage leukemia (MLL) gene frequently undergoes chromosomal translocation, which is found in acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL) or mixed lineage leukemia (MLL) [133]. So far more that 50 functionally diverse MLL fusion proteins have been identified in human leukemia. Many MLL fusion seem to activate leukemia-promotion genes by abnormal recruitment of DOT1L, a non-SET domain HMT for H3K79. Common MLL fusion partners are AF4, AF6, AF9, AF10 and ENL. MLL fusion proteins in hematopoietic cells participate in leukemiagenesis by misregulating genes with profound roles in proliferation and cellular differentiation. For example, MLL fusion-mediated overexpression of HoxA7, HoxA9, HoxA10 together with another homeobox gene Meis1 and HOX co-factor leads to recruitment of other histone modifying enzymes, such as DOT1L, PRMT1 and CBP, which ultimately results in aberrant methylation and demethylation patterns of histones (Figure 1.21) [134]. Histone demethylases in cancer HDMs function in the opposite way of HMTs, by removing methyl groups from histone tails. Aberrant expression, amplification and mutation of several HDMs has been implicated in a large number of tumor types. 34 Figure 1.21: Aberrant chromatin modifications at leukemia-inducing genes. In normal hematopoietic stem and progenitor cells, genes involved in proliferation and self-renewal, like HoxA5, HoxA9 and Meis1, are marked by H3K4me3 which is associated with moderate levels of gene transcription. During normal differentiation, these genes are downregulated correlating with loss of H3K4me3 and gain of Polycomb-mediated repressive H3K27me3. In various MLL or NUP98 fusion-mediated leukemias, the HoxA9 gene is highly overexpressed along with the aberrant acquisition of active chromatin modifications, like H3K4, H3K36, H3K79 and H4R3 hypermethylation and H3/H4 hyperacetylation. (adapted from [23]) 35 1. I NTRODUCTION LSD1, an H3K4 demethylase, takes part in diverse cellular processes including cell proliferation, growth, apoptosis and cell fate determination. LSD1 controls p53 tumor suppressive activity by demethylating p53 specific lysine-370. Therefore, a possible oncogenic function of LSD1 could be through suppressing tumor suppressor genes [135], although the mechanism of oncogenesis is not fully delineated yet. Overexpression of LSD1 has been reported in prostate cancer [136], poorly differentiated neuroblastoma [137], and estrogen receptor (ER)-negative breast cancer [138]; and all of these studies have implicated LSD1 as a marker for aggressive tumor and poor clinical outcome. H3K4me3 demethylase KDM5A is broadly expressed in variety of tissues and is involved in differentiation. Although the precise mechanism by which function cancer is not fully understood, several alteration in the KDM5A gene locus in tumors has been studied. There is accumulating evidence that the KDM5A locus is frequently lost in hematopoietic malignancies and KDM5A has recently been shown to be involved in a translocation in an infant with acute myeloid leukemia (AML) [139]. Functional in vitro studies have implicated KDM5A copy loss in suppression of the neoplastic phenotype, however there have been no descriptions of hemi- or homozygosity in the KDM5A gene in primary human tumors including lymphoid malignancies [70]. A single translocation t(11;21)(p15;p13) involving the KDM5A locus (5 -NUP98-3 JARID1A, including the first 13 exons of NUP98 and exons 28-31 (the third PHD domain of KDM5A) has been described in an infant with acute megakaryoblastic leukemia (FAB type M7) [139]. However, cytogenetic studies have shown that many patients with haematological malignancies have aberrations in the short arm of chromosome 12 (12p), which includes the KDM5A locus, due to a partial deletion or translocation. They are especially frequent in ALL [140], and are also found in ∼5% of acute nonlymphoblastic leukemias (ANLL) and myelodysplastic syndromes (MDS) [141, 142, 143], appearing as the sole chromosomal abnormality in ∼30% and 22% of these patients, respectively. Moreover, a recent study involving thousands of patients with ALL showed that leukemic cells in approximately 6-7% of patients contained 45 or fewer chromosomes [144]. These hypodiploid leukemic cells displayed dicentric chromosomes formed from chromosomes 9p (55%), 12p (18%) or both (21%). The results of these studies have thus shown that an abnormality of the short arm of chromosome 12 is a significant adverse risk factor and that patients with 12p abnormalities are reasonable candidates for allogeneic hematopoietic stem-cell transplantation. The breakpoints in 12p, as reported in studies using both cytogenetic and comparative genomic hybridization analysis, involve the ENV6/TEL locus, which is centromeric to the KDM5A locus, occurring in 12-27% of ALL and 100% of hypodiploid ALL, as well as in ANLL and MDS [145, 146, 147]. Importantly, patients with a small deletion (del(12)(p11.2p13)) involving ENV6 are readily distinguished 36 from patients with a large deletion (del(12)(p11.2)), involving both ENV6 and KDM5A, by a milder clinical course [143]. Among all the KDM5 family members, implication of KDM5B in tumor has been well demonstrated. Like LSD1, KDM5B has been implicated in the silencing of several tumor suppressor genes linked to breast cancer which includes SFN (14-3-3-σ), BRCA1, CAV1 and HOXA5 [148]. KDM5B can also act as a transcriptional coactivator by interacting with other proteins, such as c-Myc. Significant expression of KDM5B is limited to the testis, however, low level expression in other adult normal tissues is also evident and its expression in breast cancer, metastatic prostate cancer and testis markedly increases [149, 150]. Recent studies indicate that KDM5B function might specifically contribute to breast epithelial transformation [151]. In addition to H3K4 demethylases, H3K9 demethylases have also been found associated with cancer [105]. The H3K9 methylation mark usually reflects transcriptional repression and the formation of heterochromatin. Global reductions in H3K9 methylation due to the loss of methyltransferase activity are likely to be associated with the tumorigenic process. JMJD1 and JMJD2 family demethylases are specific for H3K9 demethylation. Little is known about JMJD1, however, frequent deletions of the JMJD1B 5q31 location or reduced expression of JMJD1C has been observed in various malignancies [152, 153]. The JMJD2 family can reverse H3K36 methylation as well and are mainly considered to be oncogenes; consequently, overexpression was observed in prostate cancer [154]. The antagonistic property of JMJD2 against H3K9me3 heterochromatin might case genomic instability often observed in cancel cells. H3K27 demethylases are regarded as tumor suppressor owing to their potential to act against the oncogenic PcG protein, and are found significantly decreased in several cancers. UTX (KDM6A) and JMJD3 (KDM6B) can demethylate H3K27me2/3. Somatic mutation on both allele of UTX have been found in various malignancies and decreased UTX activity is linked with poor outcome [155]. In normal cells, UTX demethylates and silence it’s targets HMG-box protein 1 (HBP), confirming a cell cycle halt by pRB [156]. Consequently, one would assume, its depletion results in proliferation. The JMJD3 gene is located in the vicinity of p53 locus. Allelic depletion at 17p13.1 which includes both p53 and JMJD3 associated with more aggressive tumor types [157]. 1.5.3 Epigenetic regulation of microRNA and cancer link MicroRNAs (miRNA) are small (∼22nt) endogenous non-coding RNAs expressed in a tissue-specific manner. They are involved in gene regulation through posttranslational silencing of their targets. They form base-paring with 3 -UTR or 5 -UTR of the target mRNA in a sequence-specific manner resulting 37 1. I NTRODUCTION in degradation and/or inhibition of translation [158, 159]. They have been implicated in a wide variety of biological processes such as cell proliferation, apoptosis, stem cell self-renewal and differentiation. Interestingly one subset of these miRNA, called “epi-miRNA”, can either directly or indirectly modulate the activity of epigenetic machineries including DNMTs, HDACs, PcG, HMTs and HDMs [160]. Therefore, it is conceivable that their aberrant expression would lead to certain disease including cancer. Anomalous expression of miRNA in cancer is possible through a variety of mechanism including defects in the miRNA biogenesis machinery, transcriptional deregulation, mutation and copy number variation. It has been also proposed that transcription factors can recruit epigenetic modulators to miRNA promoter and influence their regulation [161]. miRNAs role in cancer was first reported in CLL where miR-15 and miR-16 are deleted or downregulated [162]. To date, miRNA expression and involvement in tumor initiation and progression have been studied in many tumor types which has shown that, depending on the target gene, some miRNAs act mainly as tumor suppressors while others function as oncogenes [160, 163]. Certain epi-miRNAs regulate epigenetic effectors in a variety of cancers. Some miRNA modulating enzymes are presented in Table 1.3 [163]. Interestingly, 13 miRNAs embedded in CpG islands with heterochromatin marks were reported in ALL cancer [164]. Therefore, changes in methylation patterns in CpG islands can modulate miRNA expression pattern and leads to abnormal outcome. In human, mouse, fly and other organisms many miRNAs are embedded in coding regions of genes, can also probably undergo upregulation or repression with change in expression pattern of the hosts due to modified epigenetic signals and hence contribute favorably or adversely to the cells. Such complex interaction among various components of the epigenetic machinery and genetic factors suggests the necessity of an integrated and systematic approach to understanding the cellular mechanism. 1.5.4 Epigenetic drugs for cancer treatment The ultimate goal of studying cancer mechanisms is to find abetter way to manage this disease through elucidating suitable drug targets. In contrast to genetic changes like mutations, deletions and copy number variations, epigenetic modifications are reversible and preventable. Since much of the epigentic machinery, especially the chromatin modifying enzymes, have been linked to various tumors and undergo aberrant expression, amplification and mutation, it raises the possibility for their use as a dignostic tool as well as to identify suitable epigenetic targets for each cancer type and to design effective drugs for these targets. The ultimate goal of this is to restore the normal “epigenome”. Based on this concept, some epigenetic drugs targeting DNA methylases, histone deacetylases, hisone methyltransferases and to a lesser extent histone 38 demethylase have been tested [165, 166, 167]. Since CpGs in promoter regions undergo hypermethylation in cancer, a number of DNMT inhibitors (DNMTi) are under investigation and some have received approval from US food and drug administration (FDA) approval [112]. These incluse pyrimidine nucleoside analogs and nonnucleoside inhibitors, all of which aim to inhibit DNA methylation through targeted degradation and reactivation of tumor suppressor genes while inhibiting clonal expansion and tumor growth and inducing of cell differentiation and cancer cell apoptosis. However, nucleoside requires incorporation into the DNA raising the possibility of having toxic affects on normal cells [168]. However, the mechanism of DNMTi action is not fully understood and it is important to know if the silenced gene is the only causative molecular consequent. Currently a large number of histone deacetylase inhibitors (HDACi), either purified from natural sources or synthetically produced, have been examined. A few have received FDA approval and have been prescribed to patient with tumors [168, 112]. HDACi acts on the catalytic pocket of HDACs and can induce differentiation, cell cycle halt and apoptosis in vitro [169]. Some studies showed that a combination of DNMTis and HDACis work synergistically in some cancer types [170]. Histone methyltransferase inhibitors (HMTi) is another epigenetic drug that 39 1. I NTRODUCTION Table 1.3: Epigenetic microRNAs in cancer targeting chomatin modifying enzymes. ALL: acute lymphoblastic leukemia, AML: acute myeloid leukemia. (modified from [163]) can reactivate silenced genes. There are only three small molecules that have been tested so far for the inhibition of HMTs. Chaetocin against the SUV39 class of HMTs [171] works in human multiple myeloma. Another inhibitor DZNep has been shown to induce apoptosis in cancer cells by targeting PRC2 [172] in the breast cancer cell line MCF7, and the colorectal cell line HCT116. However, their anti-tumor effects have yet to be confirmed. Histone demethylases are another class of chromatin-modifying enzymes that can be explored as epigenetic targets. Only a few molecules have been studied so far as HDM inhibitors (HDMi). Although it is known that inhibitors of monoamine oxidase (MAOis), for example pargyline, phenelzine and tranylcypromine can suppress KDM1A, their anticancer potential has yet to be determined [112]. Besides identifying new targets and new drug development, one of the major challenges is to manage the rapid acquisition of drug resistance and sensitize tumor cells to drugs for successful therapy. Cancer cell populations are heterogeneous and a small portion of cells, hypothesized as “cancer stem cells”, can remain drug resistant. Recent work by Sharma et al. has shown a link between KDM5A and drug resistance [173]. They consistently detected a small sub-population of cells that were reversibly ‘drug tolerant’ even in a lethal condition where most other cells were killed. KDM5A has been later found overexpressed in this drug-tolerant population. However, knockdown of KDM5A significantly reduced the number of drug-tolerant cells, indicating that a distinct chromatin state acquired through activity of a chromatin modifying emzyme was responsible for the maintenance of a drug-tolerance sub-population and a rationally-targeted agent would be able to preclude drug resistance in cancer cells. However, KDM5A’s regulation of demethylase activity is not well understood, the underlying mechanism remains unexplained and it is also likely that more chromatin modifying enzymes might contribute to this drug-tolerant process in various cancer in a context dependent manner. Therefore, coregulation and co-expression patterns of histone modifying enzymes and their targets in various cancer would pave they way to explain drug-tolerant mechanisms better. Very recent findings have also enlighten KDM5A as a drug-target of choice [174]. This study showed that depletion of Kdm5a in Rb1+/− and multiple endocrine neoplasia type 1 (Men1) defective mice reduced tumor formation and prolonged survival. Interestingly, while KDM5A is a demthylase for H3K4me3, the MEN1 tumor suppressor promotes methylation at H3K4, indicating that ablation of this relationship would cause serious disease. Multiple cooperation or coregulation might be possible in cells, since depletion of Kdm5a alone was insufficient to rescue the embryonic developmental defect arising from Rb1 loss. Similarly, in cancer, like prostate and breast, where KDM5A paralog KDM5B is overexpressed, its down-regulation might benefit patients. This notion is evident from the mouse cancer model where shRNA against KDM5B suppresses breast 40 cancer growth [151]. In this modern epigenetic era, with the advent of new technologies, especially high-throughput technologies, it is now possible to study genome-wide genetic and epigenetic processes. Chromatin immunoprecipitation coupled with massively parallel sequencing by next-generation sequencer has opened the door to accurately determine the location of various histone modifications and the binding sites of histone modifying enzymes. Integration of this epigenetic information, together with genetic and trascriptomic processes will pave they way to better understand cancer mechanisms ultimately helping to identify better drug targets and combat against drug resistance. 1.6 Conserved pRB and Hippo tumor suppressor pathways in humans and flies: from cell cycle control, development to tumorigenesis It has been more than 25 years since the retinoblastoma tumor-susceptibility gene product (pRB) was first identified in humans. Since then, owing to the central role in cell cycle control, its inactivation has been viewed as an early and almost obligatory event in oncogenesis. To successfully become malignant, cells must acquire two properties: insensitivity to anti-proliferative signal and self sufficience in growth and proliferation. Interestingly, through inactivation of only pRB, cells can paobably gain both of these requirements. Besides its role in cell cycle control through the E2F transcription factor, pRB has been found to have cellular roles outside of E2F regulation, including but not limited to, apoptosis, cell differentiation and promotion of senescence. This functional diversity is achieved in part through the interaction of various proteins. The importance of these functions is emphasized by the fact that it is evolutionary conserved from simpler eukaryotes like single cell algae to higher human. In Drosophila, the pRB homolog RBF has a similar importance in terminal cell cycle exit. The Hippo tumor suppressor pathway in humans is relatively newly described and thought to play a role in organ size control by preventing further cell division once an organ reaches its proper size.In addition to organ size definition, it plays role in tissue regeneration and stem cell self-renewal, and apoptosis [175, 176, 177]. This pathway is also highly conserved and limits organ size by phosphorylating and inhibiting the transcriptional co-activators YAP and TAZ in mammals and Yki in Drosophila [178, 179]. Many of the genes involved in the Hippo signaling pathway are recognized as tumor suppressors (e.g. warts (wts), hippo (hpo), salvador (sav)), while YAP/TAZ/Yki is regarded as an oncogene. Like the RB pathway, inactivation of the Hippo pathway has been found in human tumors, for example, YAP has been found to be elevated in 41 1. I NTRODUCTION some human cancers, such as breast, colorectal, and liver [180, 181, 182, 183]. Since both of the pathways are conserved, have roles in cell cycle and have been found in many tumors, this warrants the study of probable cross-talk between these pathways in order to elucidate their synergistic role in tumor development from normal cell transcriptional control when they are abrogated. 1.6.1 pRB-E2F pathway in cell cycle control pRB is a member of a family consisting three members of related proteins called ‘pocket protein’, which includes p107 and p130. In a typical model of cell cycle regulation, pRB exerts it effects through the E2F transcription factors. In the absence of mitogenic stimuli, pRB is in a hypophosphorylated state and remains able to bind E2F and other chromatin remodeling proteins resulting in inhibition of E2F. Ultimately it inhibits the E2F-dependent trasactivation of targets required for G1 to S-phase entry. Upon mitogenic stimulation, cyclin dependent kinase (CDK)-cyclin complex becomes active and phosphorylates pRB which in turn results in dissociation of the pRB-E2F complex [184]. Therefore, the cell cycle is under tight regulation by pRB-E2F and its inactivation would lead to inappropriate re-entry to S-phase. However, pocket proteins are able to functionally indemnify one another through various capacities, rendering in vivo genetic analysis more challenging in a mammalian system. However, as many pRB functions are highly conserved in lower eukaryotes like Drosophila, this offers a more simplistic model system to study cell regulation. Compared to mammals, Drosophila contains a stream-lined catalog of the E2F and pRB families (Figure 1.22). The fly genome encodes only two pocket protein orthologs, retinoblastoma family-1 (RBF) (pRB ortholog) and RBF2 (p107/p130 ortholog) [178]. Importantly, functional properties of these two proteins appear to be clearly distinguished unlike in their mammalian counterparts [185, 186]. Moreover, the E2F family is far less complex in the fly. whereas in mammals there are eight family members divided into trascriptional activators (E2F1-E2F3a) and transcriptional repressors (E2F3b-E2F8) [187]. Among these eight transcription factors, six (E2F1-E2F6) require a DNA binding partner called the ‘Dimerization Partner’ protein (DP). Like pRB pocket proteins, E2Fs in mammals are also capable of compensating for one another in several functions. In Drosophila there are only two E2Fs: activator dE2F1 and repressor dE2F2. Both utilizes only a single DP, dDP, to generate the sequence-specific DNA-binding complex. In Drosophila RBF1 is the major pocket protein, express at a relatively higher level in most cell types throughout the developmental process. It is capable of interacting with both dE2F1 and dE2F2, and suppresses the effect of the activator complex dE2F1-dDP. In contrast, RBF2 expression is far more limited and can only interact with dE2F2 (Figure 1.22). It was reported that overexpression of 42 dE2F1 and dDP acts synergistically to derive cell proliferation and apoptosis. 1. I NTRODUCTION The Hippo pathway in Drosophila and mammals Using genetic mosaic screens, the Hippo pathway was originally identified in flies and is similar to the mammalian pRB/E2F interaction network. The Drosophila Hippo pathway is comparatively more streamlined compared to its mammalian counterpart [176, 175, 188] (Figure 1.23). The Hippo pathways consist of a core kinase cascade. When activated, Hippo binds to Salvador, forming a complex that in turn phosphorylates Warts kinase. Interaction of Warts with another protein named Mob1-as-a-tumor-suppressor (MAT) confers conformational changes which facilitates the phosphorylation process. Yorkie (Yki) is considered as a potent effector of the Hippo pathway. Therefore, the primary function of the Hippo kinase cascade is to inhibit the transcriptional coactivator Yki. Wts can phosphorylate Yki at three distinct sites where each phosphorylation elicit a different level of inactivation [175, 189]. For Yki to become reactivated it requires the presence of DNA-binding partners, such as Scalloped (Sd) or Homothorax (Hth). Recent research has identified upstream activators of the Hippo pathway such as, Expanded (Ex) and Merlin (Mer), and their interacting partner Kibra (Kbr) [176]. Under normal conditions, Yki is under feed back loop. Activation of Yki can enhance Ex activity which in turn activates Hpo, and again activation of Hpo can lead to repression of Yki. Further research reported that Ex can directly bind to Yki and sequester it near the cell membrane, preventing nuclear localization [176, 190]. DThe downstream Hippo core kinase cascade is highly conserved in mammals and includes the following proteins: Mst1/2 (homolog of Hpo), Sav1 (Sav homolog), Lats1/2 (Wts homologues), MOBKL1A and MOBKL1B (collectively referred to as Mob1; homolog of Mats), and YAP and its paralogue TAZ (Yki homolog). Upon activation, Mast1/2 can form a complex with Last 1/2 and activate it through phosphorylation, which in turn inhibits YAP/TAP dependent gene activation [191]. Confocal microscopy data of the developing fruit fly showed that in asynchronously dividing cells Yki is located in both the cytoplasm and the nucleus. However, upon cell cycle exit, Yki is eliminated from the nucleus by the mechanism mentioned above. Basically, after suppression of the Hippo signalling pathway, Yki-dependent gene activation is capable of bypassing the signal to terminally exit the cell cycle and continues augmentation of proliferation. However, this process should not last long as the cell will finally terminally differentiate, exiting the cell cycle. Therefore, although the Hippo pathway monitors Yki-dependent activity in in dividing cells, the most critical point of Yki regulation is upon cell cycle exit. 43 Figure 1.22: Mammalian and Drosophila melanogaster retinoblastoma and E2F family proteins. (top) Model of E2F transcriptional regulation by pocket proteins in mammals. Activation is depicted in green, repression in red. Eight E2F transcription factor genes (E2F1-E2F8) have been identified in mouse cells. E2F1-E2F6 contain conserved dimerization domains and form DNA-binding heterodimers with proteins of the differentiation-regulated transcription factor-1 polypeptide (DP) family. E2F6 seems to function exclusively as a repressor, and has been purified in complexes with polycomb-group (PcG) proteins. E2F7 and E2F8 lack a DP-binding domain. Homodimers and heterodimers of E2F7 and E2F8 suppress the transcription of a subset of E2F-regulated targets. E2F1-E2F5 can activate or repress transcription. E2F1, E2F2 and E2F3a seem to function primarily as activators. E2F1-E2F5 contain C-terminal domains that mediate interactions with the pocket domains of retinoblastoma susceptibility gene product (pRB)-family proteins. The three pRB-family members (pRB, p107 and p130) associate with different subsets of E2F proteins, and these interactions have two roles. The binding of pRB to the C-terminal activation domain of E2F1, E2F2 and E2F3a inhibits transcriptional activation. pRB proteins contain distinct binding sites for E2F proteins and for transcriptional repressor complexes. Therefore, in association with E2F4, E2F5 or E2F3b, the pRB proteins can also recruit chromatin-modifying enzymes to E2F-regulated promoters, thereby enabling silencing of gene expression. (bottom) The expression of dE2F1, or its release from RBF1, strongly induces transcription. RBF1 binds to the activation domain of dE2F1 and inactivates dE2F1-induced transcription. RBF1 can also interact with dE2F2, but this interaction is not required for RBF1 to repress E2F-dependent promoters. RBF2, however, does not associate with dE2F1. RBF2 forms complexes with dE2F2 that antagonize dE2F1- induced transcription indirectly and causes a shift towards repression. (adapted from [178]) 44 Figure 1.23: Hippo pathway in Drosophila melanogaster and mammals. Corresponding proteins in (a) Drosophila and (b) mammals are indicated by matching colours. Lines with arrow heads or blunted indicate activation or inhibition, respectively. Dashed lines indicate unknown mechanisms of action. (adapted from [176]) 1.7 Bioinformatics in genetic and epigenetic research: Extracting biological information from data Bioinformatics can be defined as the utilization of computational tools and techniques using mathematical, statistical and biophysical approach to address complex biological/biomedical questions. Bioinformatics is an interdisciplinary science that combines knowledge of biology/medicine and computer science together with other fields such as mathematics, statistics, physics and chemistry to study complex biological processes and generate new knowledge of biology and medicine. Francis Crick in 1958 first enunciated the phrase ‘Central dogma of molecular biology’ which states that RNA is transcribed from DNA and protein is translated from RNA; since then this simple relationship has gotten more complex and complicated. The more genes, RNAs and proteins are are discovered, the more complex interactions networks are evolving. Owing to this complexity, the study of individual genes and proteins, although necessary, can not explain the systematic operation of biological system. Therefore, genome-wide studies are becoming an integral part of any biological research. With the advancement of technology, genome-wide studies of genetic and epigenetic landscape/molecular networks are emerging as a new way to study biological function. There are several genome-wide high throughput experiments, such as sequencing, ChIP-seq, RNA-seq, Methylseq, 45 1. I NTRODUCTION ChIP-on-chip, microarray etc., producing enormous quantities of data that need careful bioinformatic analysis to extract the biological information. Since most of these techniques are relatively new, careful selection of data analysis tools/algorithms, statistical methods together with data quality control is necessary. 1.7.1 DNA microarray Biological processes depend on complex interactions between many genes. These interaction are followed by activation or suppression of a series of many other genes downstream in the pathway. Therefore, detection of changes in the global expression of genes delineates certain mechanisms prevailing in that cellular condition. The microarray is a powerful technology that allows simultaneous detection of the expression of many genes and thus helps understand gene regulation and gene networks. The basic principle of the microarray is the hybridization between two DNA strands. Microarray chips may contain a large number of DNA probes. Extracted mRNA from cells can be converted to cDNA and when applied to the microarray, where only complementary DNA species are attached to the microarray probes after washing. Fluorescently labelled target sequences that bind to a probe generate a signal that measures mRNA abundance (Figure 1.24). A microarray can be a single color/a single channel or dual color/two channel. In a single channel experiment, one RNA sample is hybridized to one array. Two-channel methods are so named because two mRNA samples are labelled with two fluorescent dyes with different colors, typically Cy3 (green, G) and Cy5 (red, R), and the two labelled samples are competitively hybridized to a single array to obtain the ratio of mRNA levels for each probe between the two samples. There are some commonly used microarray platform/technologies that are briefly discussed here: 1. Affymetrix GeneChip: This platform commonly uses short (25 mer) multiple probes (at least 11) per gene called prob-set in single channel chip. Affymetrix introduced the concept of mismatch probes (MM). A mismatch nucleotide in the middle of a short oligonucleotide probe strongly reduces the hybridization affinity, which can be used to calculate the background signal. 2. NimbleGen: This array mainly uses the single channel method for in situ synthesized, relatively high-density oligonucleotide probes. 3. Agilent: Agilent arrays use long oligonucleotide probes (60 mer) synthesized in situ using inkjet technology. Although the two-channel 46 Figure 1.24: Model of methods used in cDNA microarray technology (adapted from [192]) method is standard for this platform, a single-channel method can be used with the current arrays. 4. Illumina: Illumina uses microscopic beads, instead of the large solid support. The single-channel method is standard utilizing 50 mer probes. 5. Spotted DNA microarrays: Rather than synthesizing oligonucleotides directly on a substrate, they can be printed onto a glass slide. However, the probe density is fairly limited compared with commercial arrays. Spotted arrays are less uniform than in situ synthesis arrays, and consequently, usually a two color method is used. Microarray techniques have been widely used for tumor diagnosis, classification and prediction of prognoses and treatment. Comparisons of gene expression in normal and tumor tissues would identify genes related to tumors and indicate targets for therapeutic drugs. However, the large volume of data generated by a microarray may contain systematic biases and depending on the type of experiments and samples, only a small fraction of genes may be significantly differentially expressed. Therefore, proper statistical analysis is vital to the success of array use. Sources of systematic non-biological variances can be (i) dye bias: differences in heat and light sensitivity, efficiency of dye incorporation, (ii) differences in the amount of labelled cDNA hybridized to each channel in a microarray experiment, (iii) variation across replicate slides, (iv) variation across hybridization conditions, (v) variation in scanning 47 1. I NTRODUCTION conditions (vi) variation among technicians doing the lab work. Small variations in these conditions can induce notable changes in gene expression, resulting in both false positive and false-negative detection. Since the quality of microarray data is paramount for downstream statistical analysis, quality control analysis of array data is necessary. However, these variations and biases can be attenuated by proper control and adequate replication of the replicates and normalization. Normalization describes the process of removing (or minimizing) non-biological variation in measured signal intensity levels so that biological differences in gene expression can be appropriately detected. Nomalization methods vary depending on single-channel or two-channel data types as well. Single-channel data processing Among the single-channel technologies, Affymetrix GeneChip is most widely used. Data processing includes background correction, between array normalization and summarization. Background adjustment Background correction is often the very first step in microarray analysis. Spot signal or simply signal is fluorescence intensity due to target molecules hybridized to probe sequences contained in a spot plus background fluorescence. Background is fluorescence that may contribute to spot pixel intensities but is not due to fluorescence from target molecules hybridized to spot probe sequences. Background may be caused by optical noise, non-specific hybridization, probe-specific effects, and measurement error [193]. Initially in Affymetrix array data analysis, the mismatch probe (MM) value was used to subtracted from the perfect match probe (PM) value with the notion that the MM value reflects the background (AvDiff by Affymetrix). However, many MM values may contain actual signals, and in many cases, the MM value was found larger than the corresponding PM value [194]. Therefore, the sqRobust Multiarray Average (RMA) algorithm was proposed [194] that ignores MM values, conducts a global background correction by a signal and noise convolution model in which PM intensity distribution is modeled by an exponentially distributed signal component ‘S’ with parameter λ, and a normally distributed background component ‘B’ with mean µ and standard deviation σ. However, RMA sometimes underestimates the background, therefore “GeneChip RMA” (GC-RMA), an improved form of RMA, was developed that is able to use the sequence-specific probe affinities of the GeneChip probes to attain more accurate gene expression values. Subsequently, it has been reported that GC-RMA yields more accurate values for genes with low expression levels [195]. Affymetrix Microarray Suite 5.0 (MAS5) neighboring probe sets is another 48 approach to performs background correction. The entire array area is divided into 16 rectangular zones and the lowest 2nd percentile of the probe values are selected to represent the background value in given zones and standard deviation of these values as zone variability. Then, the background value is computed as a weighted sum of the background values of the neighboring zones with the weight being inversely proportional to the square of the distance to a given zone. The negative value obtained by subtraction of the position specific background is avoided using a small threshold value [196]. This method may be suitable when large number of data to present absolute expression of genes that are diceted significanly but not comparing to any control. Between array normalization Quantile normalization is most commonly used in normalization of Affymetrix data. It can be used for two-color data as well. Quantile normalization can force each channel to have the same quantiles. Basic steps are: (1) find the smallest log signal on each channel, (2) average the values from step 1, (3) replace each value in step 1 with the average computed in step 2, (4) repeat steps 1 through 3 for the second smallest values, third smallest values to largest values. RMA adopted quantile method for probe level data normalization (Figure 1.25). MAS5 uses a simple linear scaling on the summarized gene-level intensity for the normalization among datasets (Figure 1.25). This approach may not work properly with datasets that have large chip by chip differences. Schadt et al. proposed an algorithm that uses non-linear smooth curves using “rank-invariant set” for normalization [197]. dChip software [198] utilize this normalization technique of the summarized gene-level intensity, thus keeping the expression ratio values between two datasets under investigation unchanged by forcing the selected non-differentially expressed genes to have equal values. There are many other algorithms suggested for one-color microarray data, for example, Edwards [199] proposed a non-linear LOWESS normalization in one channel cDNA microarrays mainly for correcting spatial heterogeneity. Also, Bolstad et al. suggested cyclic LOWESS and contrast based methods [200]. Hubber et al. devised a variance stabilization normalization method for microarray data calibration and for the quantification of differential expression[201, 202]. A variance-stabilizing transformation (VST) algorithm was proposed by Lin et al. for Illumina array normalization [203]. Summarization Since Affymetrix GeneChip is designed to have multiple probes per gene, the fluorence intensity values from a probe set need to be ‘summarized’ to define a measure of a single expression value for each gene or transcript in the array. Several model-based algorithms have been proposed to address this problem. RMA utilizes a log scale linear additive model (median polish) based 49 1. I NTRODUCTION Figure 1.25: Normalization strategies for Affymetrix GeneChip data A. MAS5 normalizes the value of probe-set summary by linear scaling based on a reference array. B. RMA (robust multi-array average) normalizes the value of each probe by quantile normalization in multiple arrays. (adapted from [196]) Table 1.4: Methods for Affymetrix GeneChip data analysis. (modified from [196]) summation of probes log2 scale expression value, probe affinity effect value and accompanied error [194]. MAS5 uses the Tukey biweight (TBW) algorithm which is a robust averaging procedure. Li and Wong [204] proposed a model-based expression index (MBEI) through their observation that variation of a specific probe across multiple arrays is considerably smaller than the variance across probes within a probe set. Various methods for normalization and summarization of Affymetrix data are listed in Table 1.4. Two-channel data processing 50 Background adjustment This type of array also requires background correction using the fluorescence intensity values from the local area outside the spots. The fluorescence intensity values from inside and outside the probes area can also be use for array quality control. The simple subtraction of median background intensity from median foreground values may result in missing or negative log-intensity when background intensity is larger than that of the foreground. This can happen when the expression levels are low. More sophisticated background adjustment methods have been suggested, for example, Goryachev et al. proposed estimation of background intensity over a larger neighborhood region rather than local region [205]. Also, Edwards [199] used a smoothing function for background correction that is linear with respect to background intensity on the log scale when the subtraction of background intensity from foreground intensity is small and negative. Normalization of dual channel array The central idea of normalization for dual color arrays is to adjust for artifactual differences in intensity of the two labels. Such differences usually result from differences in the affinity of the two labels for DNA, differences in amounts of sample and label used, differences in photomultiplier tube and laser voltage settings and differences in the photon emission response to laser excitation [206]. As a first step of normalization, one needs to decide which set of genes to use for normalization. There are several options which includes (i) all genes on the array, (ii) constantly expressed genes (iii) controls and (iv) rank invariant genes [207, 208]. There are a large number of algorithms that can be used to normalize data, which are based on different rationales for solving different problems in microarray data analysis. The relationship between these normalization methods are not quite straightforward, but basically these methods can be classified into three main groups as illustrated in Figure 1.26 [207] [http://webdocs.cs. ualberta.ca/˜ys3/microarray/survey_normalization.htm]. 1. Within array normalizations In a microarrayslide, variances among different probe spots are quite common and the intensity of different fluorescent dyes may always not be the same, which contradicts our assumption. In order to eliminate or reduce the variances among gene spots within a single microarray 51 1. I NTRODUCTION The two channel method is often used with spotted DNA microarray and custom arrays. Reference DNA is labelled with Cy3 while the experimental DNA sample is labelled with Cy5. Typically, the median value of the fluorescence intensity values within a probe area is used to represent the probe. Figure 1.26: Classification of DNA microarray normalization methods (modified from: http://webdocs.cs.ualberta.ca/˜ys3/microarray/survey_ normalization.htm) slide, we perform this normalization branch. There are at least three basic normalization methods, Global, Local (Intensity-based, Spatial) and Adaptive. Global normalization: This normalization algorithm is based on the assumption that, since there are millions of RNA molecules in the sample, and the average mass of all the molecules is about the same, therefore, if the genes on the chip are randomly selected, one would expect the same or a similar amount of genes hybridized to the chip, that is, total intensities of all the genes on the chip are the same. Using this rationale, intensities of the hybridization with each sample are summed up, and the ratio between them are calculated [209]. Global normalization using the global median was suggested by Yang et al [207]. Global normalization is used when comparing two very similar populations (e.g. Normal versus Cancer). The level of expression and so Cy3 and Cy5 should remain the same for most of the genes, and can therefore be used to calculate the normalization ratio for the two dyes. Local Normalization: Apart from global variance, intensity can also vary among different probe spots. They are mostly intensity-based variance and spatial variances. Therefore, intensity-based normalization and spatial normalization are used to remove these two types of variances. It has been reported that the Log2 ratio of red and green channels for any genes is systematically intensity-dependent, which is most common for low density spots [209]. Methods that correct these kinds of biases 52 are called intensity-based normalizations. Within a microarray slide, the locations of gene probes will also affect the gene expression values. Spatial normalization methods are proposed to remove these kinds of biases. Intensity dependent linear normalization and intensity dependent nonlinear normalization using a locally weighted scatter-plot smoothing (LOWESS) curve approach can be used for this purpose [207]. There are other several extensions for global and intensity-dependent normalizations. For example, Kepler et al. [210] proposed a local regression to estimate normalized intensities as well as intensity dependent error variance. Workman et al. [211] suggested a robust non-linear method for normalization using array signal distribution analysis and cubic splines. Wang et al. [212] considered an iterative normalization of cDNA microarray data for estimating a normalized coefficients and identifying control genes. Chen et al. [213] proposed a subset normalization to adjust for location biases combined with global normalization for intensity biases. Adaptive normalization: Most normalization methods assume that most genes on the microarray slide are not differentially expressed between the two hybridized samples and that for the differentially expressed genes, the direction of the difference is symmetric between the two samples [214]. However, this consideration is not necessarily true for all circumstances, such as: if more than half of the genes are differentially expressed on the array; if the numbers of over- and under-expressed genes on the array are unequal; and only genes of specific biological interest are selected to make a customized array, which are highly variable across the samples. Therefore, Zhao et al. [214] proposed an adaptive method based on a three-component mixture model for normalization of two-channel microarray data which showed a much better performance than the global method and lowess method. 2. Multiple array normalization When we want compare expression analysis performed on multiple sildes, then a normalization is needed before hand. In this case, values are scaled to the same level for further analysis and this normalization is called ‘scale-normalization’. Scale-normalization is a simple way to scale the log ration of R and G (M-value) from a series of arrays so that each array has the same median absolute deviation [215]. 3. Replicate normalization Commonly, mean or median expression of each dataset is used for this kind of normalization. However, to utilize replicate experiment data, two 53 1. I NTRODUCTION different normalization methods are proposed: Dye-swap Normalization and Quantile Normalization. Dye-swap normalization: As different fluorescent dyes may have different effects in a single slide, to minimize the bias, the method of swapping fluorescent dyes and replicating the experiment on the same sample is proposed. After replication is performed, values from replicated experiments are averaged. Quantile normalization: This normalization method was proposed by Bolstad et al. [200]. Quantile normalization can force each channel to have the same quantiles. Quality control and batch effect removal Before analyzing data for biological variation, QC steps must be performed to determine whether the data is fit for statistical testing. Each microarray should be tested for its quality. Arrays can be affected during the hybridization procedure, especially due to uneven hybridization. Also, after normalization, several test statistics can be employed to determine the extent and quality of the normalization method used. Quality checks are also done to detect outliers and possible remnants of systematic variances. It is common to assume that the average signal intensity across the data remains the same for most genome-wide analyses. Therefore, if this is the case, the distribution of fluorescence intensity values should be similar across the arrays. There are several Bioconductor [216] packages that offer statistical analysis of array data quality, which includes ABarray, AffyExpress, affyQCReport and yaqcaffy for Affymetrix data; beadarray for Illumina platform data; and arrayQualityMetrics, mdqc for platform independent array data quality analysis. Non-biological experimental variation or “batch effects” are often observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. Batch effects denote the systematic error introduced when samples are processed in multiple batches. There are several normalization methods for adjustment of technical variance of multiple arrays. However, normalization procedures do not necessarily adjust the data for batch effects, so when combining batches of data, especially batches that contain large batch-to-batch variation, normalization is insufficient for adjusting for batch effects and other procedures must be applied. A number of algorithms have been proposed to tackle the batch effect. Alter et al. described the use of the “singular value decomposition” (SVD) method processing and modelling genome-wide expression data [217]. Benito et al. utilized statistical discrimination methods “Distance Weighted Discrimination (DWD)” based on the Support Vector Machines (SVM) 54 algorithm, which is a two-class discrimination analysis for high-dimension low sample size data, and has shown to be very effective in removing systematic biases present in microarray data sets [218]. Hohnson et al. proposed parametric and non-parametric empirical Bayes methods to filter batch effects from data [219]. Leek et al. use the “surrogate variable analysis (SVA)” to overcome the problems caused by heterogeneity in expression studies: which is a combination of singular value decomposition (SVD) and a linear model analysis to estimate the eigenvalues from a residual expression matrix from which biological variation has already been removed [220]. Walker et al. suggested a generalized empirical Bayes approach to correct for cross-experimental batch effects [221]. In their experimental design, they suggested to use a pooled RNA sample from the same tissue for each batch of experiments. This design with tissue matched reference samples allows a gene-by-gene correction to be performed using fewer arrays. Sims et al. showed that a simple batch mean-centering technique (PAMR) which is essentially a gene-wise one-way analysis of variance (ANOVA), could significantly reduce the level of inter-experimental variation [222]. Recently Lau et al. developed the “Geometric ratio-based method (RatioG)”, that scales sample measurements by the geometric mean of a group of reference measurements, for batch effect removal [223]. Recently Chen et al. evaluated six batch adjustment methods and their results suggested that ComBat, an Empirical Bayes method that estimates parameters for location and scale adjustment of each batch for each gene independently, outperformed the other five programs by most metrics they tested [224]. Differential expression analysis After proper normalization, the next most important statistical question is to determine differentially expressed genes between experimental samples and references before further downstream analysis. There are several test statistics that can be selected for this purpose. These includes t test statistics, a modified one-sample or a two sample t test, an F-statistic [225], the empirical Bayes procedure [226], a non-parametric test: the Wilcoxon sign-rank/rank-sum and rank-product [227]. The popular Bioconductor [216] package “Limma” uses linear models and empirical bayes methods for assessing differential expression in microarray experiments. It adopts moderate t-statistics [226]. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The linear model and differential expression functions are applicable to data from any microarray platform, including single-channel or two-color microarray platforms. Limma is based on the assumption of a normal distribution, and uses linear models fitted 55 1. I NTRODUCTION to normalized intensities (one-color) or log-ratios (two-color). Linear models can be used to compare two or more groups and can be used for multifactorial designs e.g. genotype and treatment. It collects information across genes and uses empirical Bayes analysis to improve power in small sample sizes. Basic analysis steps includes: (i) make design matrix/contrast matrix, (ii) fit a linear model to estimate all the fold changes, (iii) apply Bayesian smoothing to the standard errors (iv) use moderated t-statistics to calculate the p-vales of significant changes. The Bioconductor package RankProd [228] utilizes extended rank product (RP) statistics originally proposed by Breitling et al.[227] based on calculating rank products from replicate experiments. RankProd offers several advantages over t-test based methods and can be applied to pre-processed expression datasets from any platform. The significance of the detection of a differentially expressed gene is assessed by a non-parametric permutation test, and the associated P-value. A non-parametric statistic is derived from biological reasoning that detects items that are consistently highly ranked in a number of lists, for example genes that are consistently found among the most strongly unregulated (or down-regulated) genes in a number of replicate experiments. It offers several advantages over linear modelling, including the biologically intuitive of fold-change (FC) criterion, fewer assumptions under the model, and increased performance with noisy data and/or low numbers of replicates [228]. After test statistics have been selected, the next step is to compute a p-value and adjust the p-value for multiple testing. A p-value can be calculated by a resampling technique, such as permutation or bootstrapping. In a small number of arrays, instead of a permutation, marginal aggregation is more suitable [225, 229]. There are several ways to do a multiple test correction, most commonly used are the Bonferroni adjustment and the Benjamini-Hochberg procedure [230]. 1.7.2 ChIP-on-chip Chip-on-chip, also known as ChIP-chip, chromatin immunoprecipitation combined with a DNA microarray is a technique to identify locations of DNA protein binding sites as well as chromatin plasticity, a powerful tool to understand gene transcription regulation. ChIP-chip studies enable us to detect genome-wide protein-DNA interactions such as transcription factor binding, but also epigenetic events such as histone modifications by tiling arrays [231, 232]. Basic experimental procedures The first step in the ChIP-Chip procedure is to fix protein-DNA interactions in living cells by chemical crosslinking which is commonly done through 56 formaldehyde. Next, cells are lysed in buffer and then DNA is fragmented by sonication. Using a specific antibody against the protein of interest or histone modification, protein-DNA is immunoprecipitated (IP) from this extract. DNA bound by the protein will be co-precipitated and enriched compared to DNA not bound by the respective protein (input). After reversion of crosslinking the DNA is purified. Two different fluorescence labels are used to label the IP DNA and a hybridization control DNA (input), respectively. Usually total DNA before IP (input DNA) is used as a hybridization control. The two differentially labeled DNAs are hybridized to the same microarray and the difference in fluorescence intensity gives a measure of the enrichment. The key difference with transcriptomics microarrays is that in regulation microarrays the channels consist of an experimentallly enriched sample (IP) and a reference sample (input). The cyanine 3 (Cy3), or green (G), channel of regulation microarrays usually contains the total genomic input DNA that gives the reference baseline signal, and the cyanine 5 (Cy5), or red (R), channel contains IP DNA; where in transcriptomics microarrays both channels contain PCR amplified transcript samples, usually corresponding to two different experimental conditions. Therefore, while the log-ratio between the channel signals represents the differential expression between two conditions in transcriptomics analysis, for regulation microarrays it is used as a measure of enrichment: the higher the log-ratio of a probe or set of tiling probes, the higher the likelihood that the corresponding region in the genome has a high level of DNA-protein interaction or chromatin modification. Analysis of ChIP-chip data Analysis of ChIP-chip data poses two computational challenges. First, the primary statistical analysis is done, which includes quality assessment, data normalization and transformation, and the calling of enriched locations. Second, the integrative bioinformatic analysis is performed, which interprets the data in the context of existing genome annotations and of other related experimental results available, for example, from other location or expression analyses. The normalization strategy is similar to DNA microarray normalization strategy and is applied at multiple levels connected to spatial, probe and dye or intensity dependent biases [233]. However, intensity dependent bias correction does not holds the same assumptions for transcrioptiomic microarrays and regulatory microarrays. In the former case it assumes that the majority of the probes signals remain unchanged, however, in later cases this assumption does not fit as the samples in two different channel differ largely. Commonly used normalization approaches are VSN, LOWESS, quantile, T-quantile, Turkey’s biweight scaling and Pengs method [234]. In ChIP-on-chip analysis, generally assumption is made that a 57 1. I NTRODUCTION Figure 1.27: An enrichment signal around a binding site. Since DNA fragmentation through sonication can be modelled as a Poisson process, the DNA fragment length distribution follows a Poisson distribution and adjacent probes on the genome have a correlated log-ratio, resulting in the hybridization pattern shown here. Each blue column represents a probe hybridization site. Black-outlined bars represent their log-ratio. Green lines are sonicated immuno-precipitated DNA fragments corresponding to the binding site. (adapted from [234]) DNA-interacting protein is either bound or not bound and chromatin is modified or unmodified. Therefore, in contrast to transcriptomics microarray data, where low log-ratio values are meaningful as long as the differences between conditions are statistically significant, when interpreting ChIP-on-chip the upper quantile is of greatest interest, as it generally comprises mostly enriched probes. Therefore, aside from data normalization, enrichment analysis steps that detect significantly bound/modified regions are particularly important. Commonly, significantly higher log ratio intensities compared to neighbouring probes in a window, define an enriched location (Figure 1.27). Analysis tools A large number of analysis tools have been reported so far for ChIP-chip data normalization and peak finding. Bioconductor [216], an open development software project for the analysis and comprehension of genomic data, offers versatile ChIP-chip data analysis and quality control packages, that includes ACME, iChIP, BAC, Ringo, rMAT and Starr. ACME calculates if a set of tiling probes is significantly more likely to be a sampling of this upper quantile than of the rest of the data, assuming that if this is the case, this set of tiling probes corresponds to a protein binding 58 site or chromatin modification region. Therefore, a better distinction between the enriched and un-enriched components would augment the power to identify significantly enriched regions. iChIP offers Bayesian modeling of ChIP-chip data analysis through hidden Ising models to identify enriched genomic regions. It can be used to analyze the data from multiple platforms (e.g., Affymetrix, Agilent, and NimbleGen), and the data with single to multiple replicates. Besides Bioconductor, there are several other popular data analysis programs including Chipper [235], cisGenome [236] and CoCAS [237]. Chipper [235] is a potent tool that computes p values without needing a separate control for developing a model of measurement error (which may introduce systematic error). This method combines multiple replicates (separate arrays) and duplicates (same array) to produce a single overall p value for each intergenic region (IGR) containing binding sites. It utilizes variance stabilization algorithms available through the Bioconductor “vsn” package [201] rather than log ratios, which eliminate the need to threshold low-signal spots obtaining an alternative measure, δh, which interpolates between a difference and a log-ratio and is monotonically related to significance. In addition, it averages the resulting z score over replicates, which has the added benefit that an IGR that scores highly in a single replicate, but has no usable data in other replicates, may score well in the overall rankings. The q value generated from the p-value can be used to make a short list of top ranking significant binding sites [235]. 1.7.3 ChIP-seq Chromatin immunoprecipitation coupled (ChIP) with massively parallel sequencing is a revolutionary method that has enabled us to identify the locations of protein-DNA interaction throughout the genome. It is used to determine the precise global location of a desired protein binding site to DNA or chromatin modification sites. There are several advantages over the ChIP-chip technique including higher resolution, fewer artefact, greater coverage and larger dynamic range than ChIP-chip. The basic steps are: (i) formaldehyde cross-linking of protein and DNA (ii) fractionation by sonication (iii) precipitation (iv) purification and amplification of DNA (v) hybridization to adapter sequence (vi) sequencing, image processing to nucleotide (vii) bioinformatic analysis (Figure 1.28). By now, there are many versions of the basic ChIP coupled with next generation sequence (NGS) concepts to study different types of interaction in cells, such as Methyl-seq (methylation pattern through sequencing), Bisulfite-seq (bisulfite-treated genomic DNA to determine methylation status at CpG dinucleotides), CLIP-seq (microRNA-target interaction), FAIRE-seq 59 1. I NTRODUCTION (Formaldehyde-Assisted Isolation of Regulatory Elements), DNAseq (DNase I hypersensitive sites sequencing) and RNA-seq (whole genome transcriptome sequencing) - all taking advantage of high throughput sequencing and most, not all, of them use antibodies to precipitate a desired fraction of interacting molecule. Experimental procedures and challenges The basic laboratory procedure for ChIP-seq is the same as described for ChIP-chip. However, there are several additional steps needed to perform to prepare samples for sequencing. Although the ChIP technique is quite well established, there are several issues in experimental design and experimental procedures to take into account. It is necessary to be very careful at this stage especially selection of antibody, sonication, and PCR amplification. Minor errors can cause a large pit fall for the entire ChIP. On the other hand, the application of some helpful tips, even minor, can improve the quality of the experiments. Careful selection/design of antibodies from various commercial sources or in-house prepared antibodies is particularly important as antibodies of inferior sensitivity/quality might results in poor data quality [238]. Antibody design is also crucial when isoform-specific ChIP-seq analysis is performed. Also, the starting cell number, sample quantity, DNA concentration, PCR cycle etc. are critical steps in this procedure. The precise amount of ChIP DNA and the number of cells needed depends on the abundance of the chromatin-associated protein targets or histone modifications. It is critical during sonication not to produce oversized DNA fragments and also to be careful of over-fractionation/degradation. Based on the sequencing platform, a minimum quantity of DNA is necessary and very pure DNA is always required. Amplification of DNA by PCR is an important step as well, since only a few more cycles than optimum can cause a serious bias in the data. [238, 239]. Another important issue is the use of a control sample. Although several ChIP-seq studies have been reported so far without any additional control sample based on the assumption that reads can be distributed uniformly and hence random distribution of reads can be used as a background control, however, shearing of the DNA, for example, can result in non-uniform fragmentation of the genome, again open chromatin regions tend to be fragmented more easily than closed regions, which might create an uneven distribution of sequence tags across the genome. Additionally, repetitive sequences might seem to be enriched because of inaccuracies in the number of copies of the repeats in the assembled genome. A control sample experiment helps to detect artefacts produced during the experimental procedure and hence a peak in the ChIP-seq profile should be compared with the same region in a matched control sample to determine its significance. Generally three types of control samples are used: (i) input DNA; 60 Figure 1.28: Overview of ChIP-seq experiment. Using the ChIP-seq method, the specific DNA sites that interact with transcription factors or other chromatin-associated proteins (non-histone ChIP) and sites that correspond to modified nucleosomes (histone ChIP) can be profiled. The ChIP process enriches the crosslinked proteins or modified nucleosomes of interest using an antibody specific to the protein or the histone modification. Purified DNA can be sequenced on any of the next-generation platforms. The basic concepts are similar for different platforms: common adaptors are ligated to the ChIP DNA and clonally clustered amplicons are generated. The sequencing step involves the enzyme-driven extension of all templates in parallel. After each extension, the fluorescent labels that have been incorporated are detected through high-resolution imaging. On the Illumina Solexa (bottom left), clusters of clonal sequences are generated by bridge PCR, and sequencing is performed by sequencing-by-synthesis. On the Roche 454 and Applied Biosystems (ABI) SOLiD platforms (bottom middle), clonal sequencing features are generated by emulsion PCR and amplicons are captured on the surface of micrometre-scale beads. Beads with amplicons are then recovered and immobilized to a planar substrate to be sequenced by pyrosequencing (for the 454 platform) or by DNA ligase-driven synthesis (for the SOLiD platform). On single-molecule sequencing platforms such as the HeliScope by Helicos (bottom right), fluorescent nucleotides incorporated into templates can be imaged at the level of single molecules. (adapted from [238]) 61 1. I NTRODUCTION (ii) mock IP DNA (DNA obtained from IP without antibodies); and (ii) DNA from nonspecific IP (IP performed using an antibody, such as immunoglobulin G, against a protein that is not known to be involved in DNA binding or chromatin modification). Although there is no consensus which kind of control is the best, however, input DNA has seen used frequently. Bioinformatic analysis and challenges of ChIP-seq Bioinformatic analysis of the ChIP-seq data is necessary to extract meaningful biological information and is one of the most important parts in any ChIP-seq project. There are several steps and several independent analyses needed to be done throughout a ‘typical’ ChIP-seq analysis pipeline (Figure 1.29) [238, 240]. The critical steps include: quality control of data; aligning the reads to the reference genome (mapping); and the identification of binding or enriched sites (peak calling). Downstream analysis depends on the specific information one would want to gather from particular ChIP-seq data. Quality control: The first task of any analysis is to check the overall quality of the data, looking at any sequencing bias, the sequencing depth, the reproducibility of replicates etc. as quality control measures. Some basic checks include: 1. Extra sequence: check if the barcode, adapter, primer, linker sequence still remain at the end of the reads. This is particularly important for short reads 2. Per base sequence quality: Readout from sequencer includes quality measure of each sequenced nucleotide. A plot of range of quality values across all bases at each position can provide othe verall quality of the sequenced read. Poor quality reads can be excluded before mapping to the reference genome. 3. Per sequence quality: the per sequence quality score plot would check if a subset of the sequences have universally low quality values. A subset of sequences might have universally poor quality due to poor imaging, however these should represent only a small percentage of the total sequences. 4. Per base sequence content: the proportion of each base position in the reads. In a random library it is expected that there would be little to no difference between the different bases of a sequence run. The relative amount of each base should reflect the overall amount of these bases in reference genome, but in any case there should not be a huge imbalance. 62 Figure 1.29: Typical ChIP-seq data analysis pipeline. The raw data for chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis are images from the next-generation sequencing platform (top left). A base caller converts the image data to sequence tags, which are then aligned to the genome. On some platforms, they are aligned with the aid of quality scores that indicate the reliability of each base call. Peak calling, using data from the ChIP profile and a control profile (which is usually created from input DNA), generates a list of enriched regions that are ordered by false discovery rate as a statistical measure. Subsequently, the profiles of enriched regions are viewed with a browser and various advanced analyses are performed. (adapted from [238]) 63 1. I NTRODUCTION 5. Per base and per sequence GC content: in a normal random library, roughly normal distribution of GC content is expected. 6. ‘N’ content: It’s not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. However, if this proportion rises above a few percent it suggests that the analysis pipeline was unable to interpret the data well enough to make valid base calls. 7. Duplicate sequence: A low level of duplication may indicate a very high level of coverage of the target sequence, but a high level of duplication is more likely to indicate some kind of enrichment bias (e.g., PCR over amplification). 8. Strong sequence preferences were typically detected at right at the 5 -end of the tag where sequence preferences are introduced during the ligation of sequencing adapters. Variation within the first ∼30 bp is normally indicative of bias from the base calling of the sequencer. Read alignment: With a next generation sequencer billions of short reads can be sequenced which making a ChIP-seq project a success. However, there are several challenges to correctly map the reads to the reference genome, for example: (i) reads have to be mapped accurately and quickly while consuming less memory; (ii) reads can be mapped to multiple places (multimaps) due to repetitive regions of the genome (iii) sample sequences may vary from reference genome (iv) spliced mapping and mapping with intron-sized gaps. There are several ways to handle multimaps. Usually either multimapped reads are discarded, or are allocated randomly, allocated on the basis of an estimate of local coverage, although a statistical method incorporating alignment scores has also been proposed [241]. Paired-end reads significantly reduce the problem of multi-mapping. Increasing the sequence read length can partially address the problem of read mapping to multiple repeat locations. Unique sequences that flank repeats are also helpful in aligning the reads to the genome. For example, 48% of the human genome is non-repetitive, but 80% is mappable with 30 bp reads and 89% is mappable with 70 bp reads [238]. Details of alignment strategies and challenges in read alignment have been recently reviewed in [242]. There are several mapping algorithms available specifically to align short reads. Some of the popular publicly available tools includes Bowtie, BWA, MAQ, Mosaik, Novoalign, SOAP2, ZOOM [242]. There are some pipelines that would map recursively to achieve maximum read mapping [243]. A recent study has investigated the performance of some mapping algorithms [244]. 64 However, almost all short read aligners are based on the same principle of a first pass ‘heuristic’ match, which rapidly finds a short-list of possible locations, followed by a thorough evaluation of all candidate alignments by a complex ‘local alignment’ algorithm. To address the first challenge to map reads quickly, a computational strategy called ‘indexing’ is used by some aligners, such as MAQ and Bowtie [245]. In this approach, an index of a large DNA sequence is used to rapidly find shorter sequences embedded within it. MAQ is based on a straightforward but effective strategy called “spaced seed” indexing (Figure 1.30a). It builds a hash table of short oligomers present in reads. In this method, a read is divided into four segments of equal length, called the “seeds”. If the entire read aligns perfectly to the reference genome, all of the seeds will be mapped perfectly. When there is one mismatch, it will fall within one of the four seeds, but the other three will still match perfectly. Similarly, with two mismatches in two different seeds, the other two will still match perfectly. Thus, by aligning all possible pairs of seeds (six possible pairs) against the reference, it is possible to winnow the list of candidate locations within the reference where the full read may map, allowing at most two mismatches. Bowtie employs the Burrows-Wheeler transform (BWT) and full-text minute-space (FM) algorithms to index the entire human genome, requiring much less space compared to spaced-seed indexing. Bowtie aligns a read one character at a time to the BWT indexed genome (Figure 1.30b). Each successively aligned new character allows Bowtie to winnow the list of positions to which the read might map. When it does not find a perfect match, it employs a backtracking algorithm – where it makes a substitution the in preceding character of the read and resumes the search. Thus BWT enables Bowtie to first solve the simple problem of aligning first one character, and then proceed to the much harder problem of aligning two characters, and so on gradually until the entire read has been aligned. Although the Bowtie approach is more complicated, it is 30 times faster than spaced-seed indexing [245]. Although some tools have been shown to conquer the mapping problem of fast aligning, optimization of performance by correct settings of parameters is not straight forward. The choice of the alignment parameters has profound influence on the number of reads mapped and the coverage of the reference genome (number of positions with at least one read). By being too stringent, for example, uniquely mapped reads without any mismatch might lead to a loss of informative reads and hence some low-enriched regions may remain unidentified. On the contrary, being too relaxed would increase noise and allow spurious mapping. One such case has been exemplified by Leleu et al. using the program Bowtie to map Saccharomyces cerevisiae transcription factor Stel2 illustrated in Figure 1.31. As shown, there is typically a large variability in the sequencing depth and some 65 1. I NTRODUCTION Figure 1.30: Algorithmic approaches for aligning short sequencing reads. (a) Algorithms based on spaced-seed indexing: each position in the reference is cut into equal-sized pieces, called “seeds” and these seeds are paired and stored in a lookup table. Each read is also cut up according to this scheme, and pairs of seeds are used as keys to look up matching positions in the reference. Because seed indices can be very large, some algorithms index the reads in batches and treat substrings of the reference as queries. (b) Algorithms based on the Burrows-Wheeler transform: reads are aligned character by character from right to left against the transformed string. With each new character, the algorithm updates an interval (indicated by blue ‘beams’) in the transformed string. When all characters in the read have been processed, alignments are represented by any positions within the interval. (adapted from [242]) 66 Figure 1.31: Effect of various mapping options on the alignment of short reads. Short reads (length 26) with a seed length of 10 bases aligned against the reference genome using the Bowtie program. The number of mismatches (n) tested from 0 to 3. The maximum number of hits per read (m) was set to 1, 5, 10, 15 and 20. The average depth genome-wide is compared to selected regions identified as true positive peaks and to a set of locally enriched regions eventually dismissed as false positives. The latter regions have lower depth than the genome average yet higher depth than their immediate neighborhood (and therefore were discarded as insignificant, as high fold change is not equivalent to significant enrichment). This illustrates the large variation in depth observed across the genome. (i) The average coverage increases with both n and m, for all regions considered. The strongest effect appears to be caused by increasing m while the number of mismatches n has a weaker impact. (ii) Allowing up to m = 15 alignments per read leads to an increased peak height relative to the whole genome. This trend is reversed, however, when m becomes too large (m = 20). (adapted from [240]) regions can have a much lower coverage than and represents false negative [240]. In any case, the effect of the mapping parameters will greatly depend on the type and the quality of the data, and optimized parameters can be attained by running the alignment with different parameters and assessing their outcome. Peak calling: After successful mapping, the next important step is ‘peak calling’ identification of the location where a desired protein is bound, i.e. the region that is enriched in a ChIP sample relative to the control with statistical significance. Reviewing recent literature and bioinformatic resource databases, there are at least 40 open source ChIP-seq peak calling tools. Since the technology is relatively new and a wide variety of data is available, the number of algorithms/tools are increasing rapidly. Several recent publications 67 1. I NTRODUCTION [246, 247, 248] provided overviews of the main publicly available peak callers and list their principal characteristics to facilitate the user to choose a tool based on their need and data type. However, the presence of a large number of tools has complicated the selection of an appropriate algorithm. Like mapping, after the selection of a program, setting parameters for optimum performance is also critical. Altogether this makes ChIP-seq analysis a daunting task. An early study of Laajjala et al. in 2009 evaluated the performance of nine widely-used peak detection algorithms [246]. Later, there were at least two more studies [248, 249] with similar approaches to evaluate performance compared to ‘benchmarked’ data. These showed that the biological conclusion or obtained biological information can vary largely depending on the choice of algorithm [246]. The number of binding sites and the locations change dramatically from one algorithm to another, showing relatively low overlap among enriched locations called by different algorithms. Although none of the algorithms could out-perform the others and show clear superiority in performance, the performance of an algorithm depends on the type of data analyzed. Certainly one algorithm might perform better in terms of sensitivity, accuracy and usability for a particular type of dataset (e.g. different proteins, histones) than others. Therefore, the choice of algorithm depends on the type of data in question. Additionally, balancing stringency and sensitivity is a critical step that the user has to define [248]. Some technical forums (e.g. SEQanswer, BioStar [250]) as well as user forums for particular algorithms have evolved to provide assistance and improve the exchange of knowledge for this new technology. There are also some published technical protocols to help guide step by step analysis [251]. Although a wide variety of peak callers are available and they work in various ways, the basic strategy is to find regions along the chromosomes increased in read tag density over the estimated background and then after filtering artefact regions, report the actual peaks with statistical significance. Thus peak calling steps can be conceptually divided into following five ‘subtask’ [240, 247]: (i) estimation of parameters and generation of signal profile along each chromosome, (ii) building background model, (iii) determination of significant candidate peaks, (iv) determination and filtering of artifactual peaks and (v) computation of posterior statistics to rank significant called peaks (Figure 1.32). (i) Estimation of parameters and generation of signal profile along each chromosome Since most of the ChIP-seq experiments sequence single-ended reads, that is, from one of the two strands in the 5 to 3 direction, it is expected that tags from each strand are present with equal frequency, thus producing two related distributions of stranded reads, one on the positive strand and another on the 68 Figure 1.32: Steps in peak calling. A signal profile of aligned reads that takes on a value at each base pair is formed via a census algorithm, for example, counting the number of reads overlapping each base pair along the genome (upper left plot ‘+’ strand reads in blue, ‘-’ strand reads in red, combined distribution after shifting the ‘+’ and ‘-’ reads toward the center by the read shift value in purple). If experimental control data are available (brown), the same processing steps are applied to form a background profile (top right); otherwise, a random genomic background may be assumed. The signal and background profiles are compared in order to define regions of enrichment. Finally, peaks are filtered to reduce false positives and ranked according to relative strength or statistical significance. Bottom left, P(s), probability of observing a location with s reads covering it. The bars represent the control data distribution. A hypothetical Poisson distribution fit is shown with sthresh indicating a cutoff above which a ChIP-seq peak might be considered significant. Bottom right, schematic representation of two types of artifactual peaks: single strand peaks and peaks formed by multiple occurrences of only one or a few reads. (adapted from [247]) 69 1. I NTRODUCTION negative strand. This distribution often produces a bimodal shape, especially for transcription factor binding. Moreover, since the sonicated DNA fragments are sequenced from the 5 end, it is necessary to find the actual binding site, which depends on fragment length. Consequently, corresponding individual strand distributions will occur upstream and downstream, shifted from the source point (’summit’) by half the average sequenced fragment length, which is commonly referred to as the ‘shift’ (Figure 1.32 top left). Therefore, reads are shifted towards the center of each stranded tag distribution or reads are extended in the 5 to 3 direction until the estimated original fragment length and the combined tag density will form a single peak where the summit corresponds closely to the binding site (Figure 1.33). Fragment length can be determined in two ways: computationally or empirically. A potential way to determine the peak shift is to compute the cross-correlation between the two strand-specific densities. Its maximum should ideally coincide with the average fragment size (Figure 1.34). In this, the number of read starts at each position on the + strand and separately on the strand for each chromosome are computed. Then these vectors are shifted with respect to each other and the correlation for each shift is compute. Then the cross-correlation profile is plotted with the cross-correlation values in the Y-axis and the shift that was used to compute the correlation in the X-axis. Due to the ‘shift’ phenomenon of reads on the + and - strand around true binding sites, one would get a peak in the cross-correlation profile at the predominant fragment length. It is important to estimate fragment length of peak-shift accurately as it has a profound importance in building signal profile, in other words, peak finding. The number of identified peaks may vary depending on fragment length. Once these parameters are estimated, the next step is to build the signal profile, which consists of smoothing the tag counts to allow reliable region identification and better summit prediction. This can be achieved by various ways depending on the algorithm of choice, but the base mechanism is to slide a window of estimated width across the genome and compute tag signal values. One of the popular programs MACS [252], takes the advantage of the bimodal shape of tag distribution and empirically models the fragment length (d) of ChIP-seq reads. After shifting the tag data in a strand-specific fashion (d/2 towards 3 end) it performs a window scan (2d) and uses a Poisson distribution to determine tag distribution along the genome (λ BG ). To avoid PCR artefact bias, it removes duplicate tags that are more than expected (binomial distribution) based on sequence depth. SICER [253] is another popular algorithm specifically designed to address the broader peaks as frequently observed for histone modifications, computes probability scores in non-overlapping windows, and then aggregate windows into ‘islands’ of subthreshold windows 70 Figure 1.33: Strand-specific profiles at enriched sites. DNA fragments from a chromatin immunoprecipitation experiment are sequenced from the 5 end.Therefore, the alignment of these tags to the genome results in two peaks (one on each strand) that flank the binding location of the protein or nucleosome of interest. This strand-specific pattern can be used for the optimal detection of enriched regions. To create an approximate distribution of all fragments, each tag location can be extended by an estimated fragment size in the appropriate orientation and the number of fragments can be counted at each position. (adapted from [238]) 71 1. I NTRODUCTION Figure 1.34: Determination of fragment length by the cross-correlation method. Computing the cross-correlation between forward and reverse strand densities reveals the distribution of fragment length. Here the fragment length is centered at 116 bp. Comparing the results from three subsets of the data: (i) identified peak regions, (ii) whole-genome, and (iii) random genomic regions mostly devoid of peaks, shows that enriched regions before computing the cross -correlation improves the shape of the distribution. (adapted from [240]) separated by gaps in order to capture broad enrichment regions. Aside from these tools that are already in use, there are some newly proposed algorithms to handle NGS mapped reads data more efficiently. Pyicos [254] is one such recently described toolkit that not only supports ChIP-seq, but also other high throughput sequencing (HTS) data types. Its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. It utilizes a strand cross correlation concept to calculate peak-shift and Poisson analysis to call peaks. (ii) Building background model The background model denotes an estimation of statistical noise distribution or a defining set of criteria or assumption that direct the use of control data to filter out certain types of false positives in the experiment sample data. When control data is not available, the background tag distribution is usually modeled with a Poisson or negative binomial distribution. In the presence of control data, it may be used to determine parameters for these distributions. Alternatively, the control data may be subtracted from the signal along the 72 genome or the signal may be thresholded by its enrichment ratio relative to the control. Control data is helpful to correct the regional bias along the genome which occurs due to sequencing and mapping biases, variation in chromatin structure, and genome copy-number. To capture local biases in the genome, a dynamic Poisson distribution is used to model a local background (λ local ) instead of background calculated from the whole genome. (iii) Determination of significant peaks Significant peaks are called when candidate peaks are significantly overrepresented compared to the background signal by satisfying a minimum threshold, and passing the user defined multiple test corrected p-value or other statistical filters. MACS empirically calculates FDR based on the number of peaks from control over ChIP that are called at the same p-value cutoff. Since the number of sequenced tags between the sample and control may not be same, some extent of normalization is done between the sample and control before assessing enrichment of significant peaks. Variance stabilization method have been proposed [255], however, global scaling is commonly used by many peak calling algorithms. Also the number of reads and their lengths and operates can be considered in order to achieve this at the nucleotide precision level. (iv) Determination and filtering of artifactual peaks After the initial peak calling step, simple filters are sometimes applied to remove artifacts. Peaks that look like artifacts due to sequencing bias are discarded. Two popular filtering criteria are based on the distributions of tags between the DNA strands (directionality) and single-site duplicates. Directionality criteria include: fraction of plus and minus tags, fraction of plus (minus) tags occurring to the left (right) of the putative peak, and the presence of a partnered plus (minus) peak for each minus (plus) peak. (v) Computation off posterior statistics to rank significant called peaks After calling the final peaks, some kind of measured values are usually attached by a program to each peak so that they can be ranked. This includes p-value, FDR, q value, fold change etc. Depth of sequencing Any ChIP-seq experiments faces the basic question of if the number of read sequenced are sufficient. One important difference between ChIP-chip and ChIP-seq is that in ChIP-chip, the number of probes are fixed but the number 73 1. I NTRODUCTION Figure 1.35: Determination of the depth of sequencing. To determine whether enough tags have been sequenced, a simulation can be carried out to characterize the fraction of the peaks that would be recovered if a smaller number of tags had been sequenced. In many cases, new statistically significant peaks are discovered at a steady rate with an increasing number of tags (solid curve) – that is, there is no saturation of binding sites. However, when a minimum threshold is imposed for the enrichment ratio between chromatin immunoprecipitation (ChIP) and input DNA peaks, the rate at which new peaks are discovered slows down (dashed curve) – that is, saturation of detected binding sites can occur when only sufficiently prominent binding positions are considered. For a given data set, multiple curves corresponding to different thresholds can be examined to identify the threshold at which the curve becomes sufficiently flat to meet the desired saturation criteria (defined by the intersection of the orange lines on the graph). We refer to such a threshold as the minimum saturation enrichment ratio (MSER). The MSER can serve as a measure for the depth of sequencing achieved in a data set: a high MSER, for example, might indicate that the data set was undersampled, as only the more prominent peaks were saturated. (adapted from [238]) of sequence reads depends on the researchers which actually depends on the protein in question. So, sequencing depth depends on the genome size, the protein and the biological question. For example, intuitively, it is expected that a large number of binding sites are present for histones compared to some proteins that may have only a few binding regions in the genome. Consequently one would expect that a large number of reads would be required to cover each bound region with the same tag density. Ideally, when the addition of further reads fails to find more significantly enriched peaks, that saturation point (‘plateau’) is the desired tag number for that ChIP experiment. This can be done by a simulation as illustrated in Figure 1.35. 74 Downstream analysis Once significant peaks have been called, several downstream analyses can be performed to understand the regulatory mechanism and biological implications. Downstream analyses can be very diverse depending on the type of data and study questions. Some of these analyses are: 1. Annotation to genomic features: One basic analysis that can be performed using ChIP-seq data is to annotate the enriched location of the peaks to known genomic features, such as the transcriptional start site (TSS), UTR, gene-body, promoter, exon-intron boundaries, intergenic regions, bi-directional promoter regions, and the 3 -ends of genes. Usually in-house computational scripts are used for annotation and statistical significant analysis purpose, however, there are some tools available that facilitate the tasks, which include the Bioconductor package ChIPpeakAnno [256], CEAS [257], bedTools [258], GAPT [259] etc. Also some ChIP-seq analysis tools and pipelines include annotation system and other downstream analyses as a package, for example, Homer [260], ChIPseeqer [261] etc. 2. Motif finding: For protein-DNA interaction analysis, both for ChIP-chip and ChIP-seq, the most typical follow-up analysis is discovery of binding sequence motifs. Beside new motif discovery, co-occurrence of various motif/transcript factor binding sites can be analyzed. This method is discussed more elaborately in section 1.7.5. 3. Sequence conservation: One can examine whether peak regions are also conserved (and thus probably functional) by comparing them with randomly generated sequences, or determine the extent of conservation and thus group peaks based on thir conservation score (CS). As it is known that enhancer regions are more conserved, the CS score also helps to identify peaks that are located at putative enhancer regions [262]. Nucleotide level CS data can be obtained from PhastCons or PhyloP [263]. 4. Correlation to expression: ChIP-seq peaks can be classified based on their relationship to expression data. This approach is helpful, for example, to compare and understand tissue-specific binding sites of the same TF. 5. Functional and pathway enrichment analysis: overrepresentation of target genes or locations for particular functions or association of targets to pathways can be studied. Enrichment analysis is discussed more elaborately in section 1.7.6. 6. Overlap analysis: Binding location from one transcription factor (TF) or histone modification can be examined to find other overlapping or 75 1. I NTRODUCTION nearby locations of other TF or histone modifications. Also analysis of the enrichment of peak location distribution to some non-genic regions like repeat regions, duplicates, CpG, enhancer, centromeric or telomeric regions, eu - or hetero-chromatin etc. TF binding sites can be correlated with several histone modification locations that define transcriptional activation or poised states, gene-body locations, or enhancer regions. Combined with SNP information, ChIP-seq data can also be used to investigate allele-specific binding and modification. 7. Clustering of read densities: Another advance study could be read clustering and the integration of various other biological information content data to this. The general scheme in this analysis is to first estimate for a set of genomic regions (i.e., TSS) the read density of multiple ChIP-seq datasets. Then, apply clustering (e.g. K-means cluster, hierarcheal cluster) and visualization methods to show groups of regions with similar binding features. This approach is useful to integrate and compare binding sites among ChIP-seq datasets. Integration of other sources of information, such as, expression, the motifs and pathways enrichment or the level of conservation would further elucidate regulatory mechanisms. Homer, ChIPseeqer, seqMINER [264] offer some of these analyses. 1.7.4 RNA-seq RNA-seq is another revolutionary invention for transcriptome analysis which includes the quantification of abundance (expression analysis), differential expression, splicing and isoform detection, detection of gene fusion events, novel transcript detection and single nucleotide polymorphism detection [265]. Since versatile information can be obtained in addition to the determination of expression of transcripts with great sensitivity and accuracy, RNA-seq has become more popular than microarrays. However, the cost is still of concern. Like ChIP-seq analysis, there are several critical steps as well as biases that need to be addressed correctly – and those make RNA-seq analysis a challenging task. Data analysis is also quite challenging due to the fact that the analysis can be customized based on the need to address various questions and for the identification of novel properties of the data. The basic steps of RNA-seq analysis, and challenges in data analysis have been recently reviewed [265, 266]. Basic experimental steps The first step is extraction of RNA from the cell. Similar to ChIP-seq, RNA-seq utilizes modern next-generation sequencing as one of its steps. RNA-seq may include an enrichment technology, similar to ChIP-seq. For example, instead of 76 Figure 1.36: Overview of RNA-seq experimental steps. The isolated RNA fraction of interest is selected and then reverse transcribed. The resulting cDNA is sequenced. using an antibody for precipitation, it takes advantage of the poly(A) tail in the RNA sequence, using a poly(T) adapter sequence coupled to magnetic beads to enrich the RNA fraction. Samples are then converted to cDNA. Illumina uses random hexamer priming for this purpose. After that, the samples are prepared for sequencing through ligation of an adapter and other experimental steps (Figure 1.36). Bioinformatic analysis and challenges The main steps in RNA-seq expression analysis are: (i) mapping of short reads (ii) allocating reads to known and novel gene models (iii) normalization and estimation of transcript abundance (iv) differential expression analysis. RNA-seq can be combined with other genome-wide methods to provide an integrated view of gene regulation. Although customizeable, a basic analysis pipeline for differential expression analysis using RNA-seq data was recently described (Figure 1.37) [266]. Because this technique is fairly new, algorithms/tools for data analysis have been growing gradually. A list of some 77 1. I NTRODUCTION recent software packages and their main features are summarized in recent review papers [247, 266]. (i) mapping of short reads Since this is trascriptome sequencing and many novel features can be detected when comparing to a reference genome, the mapping step is critical and optimization of mapping parameters is quite challenging. The basic strategy for mapping millions of reads to a reference genome and/or transcriptome are similar to that for ChIP-seq. However, the aligner for RNA-seq is specifically designed to handle mapping with splice junction, gap, insertion or deletion (indel) and polymorphism. De novo assembly of reads is also possible without a reference genome and may be the only option for organisms for which no genomic reference or annotation is available (Figure 1.38). However, this is computationally intensive and may require long, paired-end reads and high levels of coverage to work reliably. The main challenges of mapping RNA-seq reads center around the: 1. Mapping reads over splice junctions and sequences with structural variation. 2. Handling reads that map to multiple locations. Besides repeat regions in the genome, the presence of conserved domains of paralogous gene families causes multimapping. Some reads from paralogs can map best to another paralog or pseudogenes because of higher sequencing error. 3. The presence of short and long interspersed nuclear elements (SINEs and LINEs) in the untranslated regions of genes, and the abundance of retroposed pseudogenes with little sequence variation compared to actual mRNA of that gene also makes it difficult to map reads correctly to the originating locus based on RNA-seq data alone. Also, larger intron-spanning reads from a source gene may map perfectly and uniquely to a retroposed pseudogene. 4. Allocating reads mapped beyond defined exon or gene boundaries. Therefore, the main task of the mapper is to find the location where each short read best matches the reference, while allowing for errors and structural variations. In order to handle junction reads and map quickly, the typical work-flow of an aligner is to map the genome first, then the unmapped reads are mapped using a junction library and still yet unmapped reads are used for building novel splice variants. However, one of the popular programs HopHat [267], which uses Bowtie [245] as 78 Figure 1.37: Overview of the RNA-seq analysis pipeline for detecting differential expression. The steps in the pipeline are in red boxes; the methodological components of the pipeline are shown in blue boxes and bold text; software examples and methods for each step (a non-exhaustive list) are shown by regular text in blue boxes. First, reads are mapped to the reference genome or transcriptome (using junction libraries to map reads that cross exon boundaries); mapped reads are assembled into expression summaries (tables of counts, showing how may reads are in the coding region, exon, gene or junction); the data are normalized; statistical testing of differential expression (DE) is performed, producing and a list of genes with associated P-values and fold changes. Systems biology approaches can then be used to gain biological insights from these lists. RPKM, reads per kilobase of exon model per million mapped reads; TMM, trimmed mean of M-values (adapted from [266]) 79 1. I NTRODUCTION Figure 1.38: Mapping strategy of RNA-seq spliced reads. (a) In de novo transcriptome assembly, splice-crossing reads (red) will only contribute to a contig (solid green), when the reads are at a high enough density to overlap by more than a set of user-defined assembly parameters. Parts of gene models (dotted green) or entire gene models (dotted magenta) can be missed if expressed at a sub-threshold. (b) Splice-crossing reads can be mapped directly onto the genome if the reads are long enough to make gapped-read mappers practical. (c) Alternatively, regular short-read mappers can be used to map spliced reads ungapped onto supplied additional known or predicted splice junctions (adapted from [247]) 80 a base mapper, alternatively, first identifies enriched regions representing transcribed fragments (transfrags) then builds candidate exon-exon splice junctions to map additional reads across (Figure 1.39). To avoid mis-assignment of reads to retroposed pseudogenes, one approach is to map of reads simultaneously across the genome and splice junctions, thus turning them into multireads that could be subsequently handled separately. (ii) allocating reads to known and novel gene models Once the locations of reads are determined, the next step is to aggregate reads over known or new gene models. Using user supplied annotations, all programs organize the reads that fall onto some biologically meaningful unit, such as exons, transcripts or genes. One simple and typically used approach is to count the number of reads that overlap the exons of a gene. However, a significant fraction of reads may reside outside of the annotated exons or may extend beyond annotated 3 untranslated regions or alternatively spliced 5 untranslated regions, or retained introns, – which may be due to intronic RNA generated from incompletely spliced heterogenous nuclear RNA (hnRNA) or from the mismapped reads. Different tools use different strategies, TopHat aggregates reads into transfrags using the assembly module in the Maq program [268]. There are several other methods which includes: (i) all reads along the full length of genes and this will also include reads from intron, (ii) reads that are mapped only to coding sequences, (iii) junction reads [269], and (iv) reads along the de novo predicted exons. (iii) Normalization and estimation of transcript abundance Given a gene model, transcript abundance can be estimated simply by counting the number of reads. However, the read counts can be biased due to the length of the mRNA as well as its molar concentration. Therefore, normalization is a necessary step to accurately compare expression levels between and within samples [270, 271] as well as in differential expression (DE) analysis [272]. Normalization methods differ for between- and within-library comparisons and several methods have been proposed so far. Within-library normalization This allows quantification of expression levels of each gene relative to other genes in the sample. Gene length, GC content variation, sequence composition, dinucleotide frequency, and non-uniform cDNA fragment distribution are mainly responsible for affecting read counts within the lane [273]. Since higher read counts can be favoured for longer transcripts compared 81 1. I NTRODUCTION Figure 1.39: The TopHat pipeline. RNA-Seq reads are mapped against the whole reference genome, and those reads that do not map are set aside. An initial consensus of mapped regions is computed by the program Maq [268]. Sequences flanking potential donor/acceptor splice sites within neighboring regions are joined to form potential splice junctions. The initially unmapped reads (IUM) are indexed and aligned to these splice junction sequences (adapted from [267]) 82 to shorter ones, a common method for within-library normalization is to divide the summarized counts by the length of the gene. Mortazavi et al. proposed the RPKM (reads per kilobase of exon model per million mapped reads) method which normalize the read count by the length of the mRNA and the number of million mappable reads to obtain reads per kilobase per million (RPKM) values [270]. Therefore this accounts for both library size and gene length effects in within-sample comparisons. However, RPKM can be influenced by several factors including sequence content, integrity of the input RNA, the extent of ribosomal RNA remaining in the sample size selection steps and the accuracy of the gene models used. It has been found that the GC content of the sequences has a strong effect on gene expression. Therefore, very recently Hansen et al. proposed a conditional quantile normalization (CQN) algorithm based on a Poisson model for read counts that combines robust generalized regression to remove systemic bias from GC-content variation, and quantile normalization to correct for global distortions [274]. Also Bullard et al. proposed an algorithm for upper quantile normalization [272]. Risso et al. recognized the GC content bias and suggested three different algorithms to normalize the data [275]. Specifically for copy number alteration studies, Yoon et al. [276] proposed a window based count and normalization for GC content variation. Also, Boeva et al. showed a polynomial regression concept for normalization based on binning reads in non-overlapping windows and regressing bin-level counts on GC-content [277]. Sequencing, based on random hexamer priming, as utilized by the Illumina platform, suffers from sequence composition biases. The bias takes the form of patterns in the nucleotide frequencies of the beginning of the read. Hansen et al. provide a re-weighting scheme, where each read is assigned a weight based on its nucleotide composition and provides a corrected read count to mitigate the impact of the bias [278]. Roberts et al. identified the problem of non-uniform cDNA fragment distribution in RNA-seq data and suggested a likelihood-based approach for correcting this fragment bias [279]. Protocol-specific sequence biases, for example by PCR amplification, or differing primer affinities and mixtures, have been evaluated by Jones et al., and using a simple graphical model which does not rely on existing gene annotations, they proposed a bias correction method [280]. More recently Zheng et al. addressed the above mentioned major biases in RNA-seq data and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously [281]. Between-library normalization: 83 1. I NTRODUCTION Although technical biases, such as sequence composition and gene length will normally cancelled when comparing the same gene in different samples, however, the normalization to adjust sequencing depth is still essential for comparing counts from different libraries relative to each other. The simplest and most commonly used normalization method is to adjusts by the total number of reads in the library [271]. However, more sophisticated normalization can be employed to address composition effects [48], or the fact that a small number of highly expressed genes can consume a significant amount of the total sequence [45]. To address this issue Robinson et al. proposed to estimation of a scaling factor from the raw data which can be used within the statistical models that test for DE [282]. Alternatively, quantile normalization, CQN [274] and a method that uses matching power law distributions [283] have also been suggested for between-sample normalization of RNA-seq data, although not much effect in DE was observed [266]. (iv) Differential expression analysis The goal of a DE analysis is to find genes that have changed significantly in abundance between two experimental conditions. In general, this means taking a table of normalized summarized count data for each library and performing statistical testing between samples of interest. However, unlike microarrays, which give continuous expression values, RNA-seq data are discrete values, and hence statistical models appropriate for count data are vital to extracting the most information from RNA-seq data. There are some software packages that offer differential expression analyses. However in principle they are mainly based on two types of statistical distributions, namely Poisson (DEGseq [284]) and negative binomial (DEseq [285], edgeR [286], baySeq [287], Myrna [288]). Earlier studies showed that RNA sequencing could be modeled as a random sampling process, and based on this assumption, the reads can follow a binomial distribution which could be approximated by a Poisson distribution. Coupled with Fisher’s exact test and a likelihood ratio test differentially expressed genes can be identified [284]. The DEGseq package is based on integration these two methods. However, count data is discrete and skewed and hence not well approximated by a normal distribution. Therefore biological variability may not be well captured by the Poisson assumption [289]. Thus, a test based on negative binomial distribution, which is a generalisation of the Poisson model, can reflect these properties, and better detects differential expression. The Bioconductor package DEseq utilizes this distribution and estimates the variance in a local fashion, using different coefficients of variation for different expression 84 1.7.5 Discovery and search of regulatory motifs Elucidating the molecular mechanisms that regulate the expression of genes is a major challenge in biology. Gene expression starts with the binding of transcription factors (TF), generally to enhancer and promoter sequences, thus activating or inhibiting the transcription machinery. Therefore, our ability to identify regulatory elements, especially the transcription factor binding sites (TFBS), is a important task. Binding sites composed of nucleic acid sequence patterns are commonly known as motifs. These are usually fairly short DNA segments of 5 to 20 base-pairs (bp) long [290, 291], recurrently present in different genes or several times within a gene. Since TFs bind directly to the double-stranded DNA, motifs can occur on both strands. However, information on most transcription factors and their target binding sites is limited. Therefore, computational prediction of regulatory elements can offer a solution. Previously, binding sites were usually determined through DNase footprinting, and gel-shift or reporter construct assays, whereas binding affinities to artificial sequences were explored using SELEX. With advances in sequencing technology and high-throughput gene expression analysis, information on sequence and co-regulatory gene modules has become available – which has enabled the development of computational methods designed for the discovery of novel regulatory motifs as well as searching for known motifs in given sequences. Under this scheme, there are two types of tools, one that discover motifs de novo from a collection of sequences, and another that scans the sequences to find TFBS’s based on previously modelled binding specificities of a TF known as a position-specific score matrix (PSSM) [292]. For the discovery of new motif from a set of given sequences, the algorithm searches statistically significant overrepresentations of short sequences that occur more often than one would expect by chance, and nothing is assumed a priori of the transcription factor or its preferred binding sites. Usually, a set of regulatory regions (e.g. promoter) of genes that are believed to be coregulated (e.g. genes that show similar expression patterns) are scanned, as coregulated genes are known to share some similarities in their regulatory mechanisms, possibly at the transcriptional level, their promoter regions might contain some common motifs that are binding sites for transcription factors. Therefore, promoters of co-expressed genes, top scoring binding sites from ChIP-chip and ChIP-seq experiments are commonly used for motif searching. Accurate identification of these motifs is a difficult task because they are short signals in the midst of long sequences which exert a great amount of statistical noise. Aside from common forms of DNA motifs, there are two 85 1. I NTRODUCTION strengths. This helps to remove potential selection biases in the hit list of differentially expressed genes, and gives a more balanced and accurate result. special types of DNA motifs: (i) palindromic motifs – a subsequence that is exactly the same as its own reverse complement, and (ii) spaced dyad (gapped) motifs – which consist of two smaller conserved sites separated by a spacer (gap). Usually a small (3-5 bp) fixed length, but variable in sequence, spacer occurs in the middle of this motif because the transcription factors bind as a dimer having two separate contact points with the DNA sequence [293]. From an evolutionary point of view, it has been observed that selective pressure causes functional elements to evolve at a slower rate than non-functional sequences, which makes well conserved sites among a set of orthologous promoter regions excellent candidates for functional regulatory elements or motifs [294, 295]. Therefore, some motif finding algorithms use information from cross-species genome comparison or phylogenetic footprinting. This approach enables algorithms to perform equally well for both eukaryotes and prokaryotes. There are a large number of algorithms that have been proposed to identify short motifs. These can be divided into two major groups based on the combinatorial approach used in their design: (i) a word based algorithm – this method relies mostly on exhaustive counting and comparing oligonucleotide frequencies and (ii) a probabilistic algorithm – sequence model where the parameters are estimated using a maximum-likelihood principle or Bayesian inference. Several motif finding tools and algorithmic bases have been recently reviewed in [293, 296, 297]. Two popular motif finding programs are MEME [298] and Weeder [290]. MEME is based on probabilistic motif finding algorithms that apply potent statistical techniques called EM, whereas Weeder is a word based algorithm. Word-based methods consider global optimality and are appropriate for short motifs and are therefore useful for motif finding in eukaryotic genomes where motifs are generally shorter than prokaryotes. Weeder determines motifs through a consensus-based method where all the oligos of (or up to) a given length are enumerated and a description is built using the most frequent nucleotide in each position. The frequencies can be stored as PSSM. However, all oligos differing in a maximum number of substitutions from the consensus are also considered. Each of them then evaluated through comparison with a species-specific background model built from the oligonucleotide distribution of all promoter (or intergenic) regions available for different species, and final ranking is performed according to statistical measures of significance. Motifs that are more likely to represent conserved TFBSs are reported. However, to search quickly, Weeder employs an exhaustive search algorithm with input sequences being preprocessed and organized through a suffix tree indexing structure [299]. There are other kinds of algorithms that scan through sequences in search of binding sites that match given motifs in the form of PSSM – a matrix of 86 numbers containing scores for each residue or nucleotide at each position of a fixed-length motif. There are two types of weight matrices. (i) position frequency matrix (PFM) – that records the position-dependent frequency of each residue or nucleotide; and (ii) position weight matrix (PWM) – contains log odds weights for computing a match score. PWMs are calculated from PFMs. Two popular databases namely TRANSFAC [300] and JASPAR [301] provide matrix-based TFBS profiles. There are numerous tools that can be used to identify significant TFBSs. Schones et al. described various tools that search TFBSs based on predefined PSSMs, and various mathematical ways to determine suitable p-values of significance [302]. They also introduced a method of estimating the statistical significance of single site matches using a method called ‘database of known promoters’ to produce data structures that can be used to estimate p-values for binding site matches. This method has been implemented in their tool named STORM. 1.7.6 Enrichment analysis Classical approaches of biological research consisting of studying one or a few genes at a time to provide in-depth information centred around those genes. However, they are unable to explain the underlying cellular mechanisms at the system level. Therefore, with the advent of technologies, global measures of genetic, epigenetic and proteomic changes under certain experimental conditions becomes possible. From genome-wide studies, thousands of genes that are associated with a particular phenotype or outcome, must be then studied in a manner that makes biological sense. Biological explanation and functional clustering of such large gene lists is still a challenging and daunting task. In this direction, modular analysis of genes has become popular. A module is defined as set of genes that share a common biological property. Such modular analysis is logical and can provide biological insight due to the fact that any biological process typically involves a group of genes interacting sequentially rather than working alone. Thus, one potential method of knowledge discovery from high-throughout genomic experiments is enrichment analysis (EA), which utilizes prior knowledge to find over-represented groups of genes for particular information or a biological property. The fundamental principle working behind EA is the logic that if a biological process is manipulated in a given study, it is most likely that co-functioning genes will have a higher probability (enriched) to be selected as a relevant group in the screening process. According to this logic, instead of single gene-based analysis, a gene group-oriented view would find a correct biological process under investigation. Thus, gene-annotation enrichment analysis has become a promising for elucidating molecular mechanisms in any high-throughput study that finds large numbers of 87 1. I NTRODUCTION genes. As a results, a large number of EA algorithms/tools have been developed and the list is still growing. In 2009, Hung et al. [303] surveyed and reviewed as many as 68 enrichment analysis tools and at present there are many more. A conventional method of EA for most of the tools is to utilize the user’s pre-selected gene list in the study or list of genes under question. Then the program tallies the number of genes from these ‘genes of interest’, that hit a given biology class. A p-value is calculated by counting the the total number of ‘hits’ in each class as compared to random chance. The enrichment can therefore be quantitatively measured by some common and well-known statistical methods, including Fisher’s test, Chi-square, Kolmogorom Smirnov, Binomial probability (Poisson, exact, Z-score) or a Hypergeometric distribution. In general Binomial probability is best suited for analysis with a large population background whereas the Fisher exact test, Chi-square test and the Hypergeometric distribution are better suited for analysis with a smaller background population [304]. Hung et al. further classified these EA tools into three classes based on the mood of foreground ‘gene of interest’ selection and the relationship between the gene and module: 1. Class I: Singular enrichment analysis (SEA): Enrichment p-value is calculated for each term from the pre-selected interesting gene list. Then, enriched terms are listed in a simple linear text format. This strategy is used in many tools. 2. Class II: Gene set enrichment analysis (GSEA): Entire genes (without pre-selection) and associated experimental values are considered in the enrichment analysis and genes are ranked by this value. Thus it avoids the necessity of gene pre-selection. 3. Class III: Modular enrichment analysis (MEA): This is similar to SEA. However, the term-term/gene-gene relationships are considered when calculating enrichment p-values. Measure enrichment by considering parent-child relationships. As next generation sequencing become comes into use, ChIP-seq and RNA-seq data will be continuously generated. Therefore, some tools are also designed to perform enrichment analysis addressing the challenges or biases associated with these data types. A foreseen differences between these tools and conventional EA tools is that they take into account the ‘gene list’ that is needed as a primary input for subsequent statistical analysis steps. For example, the EA analysis implemented in the ChIPpeakAnno [256] and ChIPseeqer [261] package pipelines, directly accepts genomic location data from ChIP-seq experiments as a input and annotates peaks to the nearest gene. However, the approach of annotating distal intergenic peaks by the ‘closest gene’ may have bias towards 88 For EA analysis for differentially expressed genes derived from RNA-seq count data, Young et al. proposed another tool named GOseq [306]. The logic presented behind proposing a new algorithm for RNA-seq data EA analysis is that canonical methods may give biased results due to over-detection of differential expression for long and highly expressed transcripts as compared to shorter and less expressed ones. This concept is valid even when two transcripts have the same level of expression but differ in length, as they might yield different numbers of total reads. Consequently, longer transcripts favour detection of differential expression more efficiently compared to their counter parts. Similarly, this proposition holds for highly expressed transcripts as well. As a result, long or highly expressed transcripts are advantageous for detection of differential expression compared to shorter and/or lowly expressed transcripts. Moreover, normalization or rescaling can not mitigate the selection bias that statistical power increases with the number of reads – which is an unavoidable property of count data. Therefore, an algorithm called a ‘probability weighting function’ (PWF) that quantifies how the probability of a gene is selected as differential expression changes as a function of its transcript length, employed in GOseq would account for this length bias. As one challenge in this field, Huang et al. stressed the need for enhancing exploratory capabilities and graphical presentations [303]. Another challenge that was emphasized, which is somewhat also related to the way results are generated and visualized/presented, is cross-comparing enrichment analysis results driven from multiple gene lists. This is because, when they compare the outcome of basic EA analysis by different classes of tools using the same data-set, results are almost similar. Although these tools differ by the manner of presentation and each tries to perform better than their predecessor, most of them have shortcomings in the interactive and intuitive way of visualization allowing for result comparisons. In a study where a large number of experimental conditions or a large number of different samples are used, an efficient technique is necessary to present the EA results in a compact and yet explicable way, maintaining in-depth information and results and visually comparable from one condition to another intuitively. One such such tool, very recently published, named Gitools [307] presents the EA result in an interactive (editable) heat-map format which meets this demand. This way it not only aids in comparing results derived from multiple lists of ‘genes of interest’ but also functions as method of data integration. The main advantages of this tool for EA analysis are: 89 1. I NTRODUCTION the genes that are flanked by very large intergenic regions. The GREAT algorithm [305] annotates peaks by defining a ‘regulatory domain’ for each gene. Therefore, the fundamental difference between these tools and conventional EA tools is that, they have one added upstream step to assign peaks to a ‘gene list’ in a more efficient way. 1. It can run and visualize several conditions at the same time (heat-map), 2. it facilitates browsing of the results in an intuitive and efficient way, 3. It is possible to run several statistics with the same data, 4. It can be run in batch mode simultaneously with many sets of experimental data each with many modules. Basically, Gitools is more than a conventional EA analysis tool. It offers many other basic and advanced analyses and can work equally well with any kind of data with values or lists consists of genes, proteins etc. Basic EA analysis steps are the same as described for most tools (Figure 1.40). It can work with preselected gene lists (as of ‘Class-I’) in binary format, as well as continuous data with Z-score statistics without the need for a cut-off to select ‘genes of interest’. However, it differs from ‘Class-II’ in the sense that it does not rank these genes based on values associated with them. It allows user to download modules from some well known resources like Gene Ontology [308], KEGG pathways, EnsEMBL Biomart [309] [310]. It also offers oncogenomic data as well as modules from “IntOGen” system [311]. However users can also get modules from other sources or produce their own module in various formats. 1.7.7 Databases and resources One of the major requirements of bioinformatics research is a source of datasets. Often research is conducted mainly based on public sources of datasets. Also, integrative analysis requires combination of several types of experimental information which includes TF binding, expression, histone modifications, mutation, copy number variation, – which most of the time is impossible to generate by one laboratory. Therefore, public databases are paramount for both computational and integrative analyses. There are several world-wide consortiums and projects with the aim of generating large amount of genome-wide data in several tissue systems and in several organisms are ongoing. Some of the major international projects and databases are described in the Table 1.5. 1.7.8 Translating computational research to biology: Integrative analysis of genetic and epigenetic data The Bioinformatic field has arisen and received much attention at present due to the recent advent of technologies in biological experiments that generate enormous quantities of data requiring interrogation and interpretation. Such experiments include whole-genome sequencing, expression arrays, genome-wide association studies, proteomics, transcriptomics, metabolomics, 90 Figure 1.40: Enrichment analysis steps in Gitools. Gitools needs two files to perform enrichment analysis. The first step (top) is to prepare these files, namely data matrices (top left) and module files (top right). A data matrix is a bi-dimensional structure in which for each dimension (row and column) there is a value. A data matrix can be binary or continuous. A typical data matrix consists of genes in rows and experimental conditions in columns. In a binary matrix, ‘genes of interest’ or foreground genes are marked by a “1” where background or the rest of the genes are marked as “0”. In continuous data matrix, actual gene value is mentioned. Modules (also known as gene-sets or concepts) are lists of genes or other biological elements sharing a common biological property. A simple two column module is a table of lists of attributes, where each row is an element of the list and each column an attribute. In the next step, an appropriate statistical method is selected based on the data type and study question. Then Gitools produces enrichment results in a color coded interactive Heat-map. 91 1. I NTRODUCTION Table 1.5: Public databases of genome-wide experiments. 92 among others. With the advancement of sequencing technology, and world-wide projects like the 1000 Genomes Project, HapMap, the International Cancer Genome Consortium, and ENCODE, we are currently generating larger volumes of data than ever before. Although these approaches have become popular due to their abilities to handle large quantities of data, bioinformatic applications are not limited to dealing with large datasets that are otherwise not possible to analyze manually. Computational searches using biological data, even on a small scale, have been proven to drive hypotheses, simulate experiments and predict results that would otherwise demand the application of long, teditious and expensive laboratory procedures. Therefore, at present bioinformatic studies are not limited to theoretical computational prediction only, but, working closely with a biological scientist, bioinformaticians are playing roles in designing experimental procedures, delineating hypotheses, large scale data analyses, modelling/simulation/prediction, suggesting experimental validation, and data interpretation. The vast amount of large scale genetic and epigenetic data, especially in the field of tumor biology, being generated everyday from heterogeneous sources encounters a challenge to interpret and extract biologically meaningful information. However, data generated by individual experiments can not explain complicated biological processes. An integrative analysis of various kinds of experimental data is needed to understand the underlying molecular mechanisms. For example, location analysis of various histone modifications independently would only tell which of the genes are receiving these modification. But only when combining the information of various modifications, can it illustrate the interaction of one with another, explain the necessity of such bi- and multi-lateral controls for cells; and the logic behind the relative distribution of various modifications. To this end, combinations of expression data would tell us spatial and temporal control mechanisms. Moreover, integration of these genetic changes would clarify how the epigenetic switching of signals transmitted to genetic processes. Furthermore, addition of functional clustering would tell which groups of genes and signals were required for particular processes. Also, to face the challenges presented by complex biological processes, especially those that are prevailing in cancer cells due to their heterogeneous nature, a close collaboration between the experimental biologist and bioinformatician is important. As the magnitude and dimension of experiments increase, it will gradually become demanding to perform multidisciplinary studies where researchers from two or more fields work together. Thus integration of various experimental data as well as knowledge and expertise would help in understanding complex biological problems. 93 1. I NTRODUCTION Part II Objectives 95 My long-standing goal is to exploit the concept of inhibiting (“drugging”) an epigenetic enzyme (RBP2/KDM5A) which would provoke the effects of the tumor suppressor (pRB) whose inactivation is generally required for cancer progression. Such inhibition would be predicted to act on the downstream pathway of the tumor suppressor. Understanding such effects will shed light on the development of resistance to currently used therapeutic drugs. A detailed understanding of the mechanism of cell cycle and differentiation control by pRB/E2F and associated pathways and interactors will pave the way to our endeavor. With this end in mind, the main objectives of my current work are represented as follows: 1. Identification of epigenetically regulated gene modules: elucidate the mode of recruitment of histone modifying enzymes and their targets’ expression patterns. Explore possible coregulation among them and thus better define the general landscape of histone methyltransferases and demethylases. 2. Understanding the functions of the RB/E2F tumor suppressor pathway and its link to epigenetic regulation. 3. Determine the role of epigenetic factors in the regulation of pRB/E2F-dependent genes, in particular, the role of the histone demethylase KDM5A in the regulation of these genes during the cell cycle and differentiation. 97 1. O BJECTIVES Part III Results 99 Chapter 2 SELECTIVE TARGETING OF HISTONE METHYLATION Authors: Abul B.M.M.K. Islam, William F. Richter, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya Published in: Cell Cycle, February 1, 2011, 10(3): 413-424; Full text: http://dx.doi.org/10.4161/cc.10.3.14705 Summary Here I have described the relative distribution of histone modifying enzymes HMTs and HDMs. Also I was interested in the mode of expression of the KDM5A paralog KDM5B and it’s recruitment in the absence of KDM5A. To determine this, first I optimized ChIP-seq and and Chip-on-chip data analysis pipeline, then I utilized publicly available ChIP-seq and ChIP-on-chip data on HMTs and HDMs and literature relating to the interacting partners of those HMTs and HDMs. This is the first study in my knowledge where a large number of HMT and HDM targets are analyzed together and compared. 101 Islam AB, Richter WF, Lopez-Bigas N, Benevolenskaya EV. Selective targeting of histone methylation. Supplementary material. Cell Cycle. 2011 Feb 1;10(3):413-424. Chapter 3 COREGULATION OF HISTONE MODIFYING ENZYMES IN CANCER Authors: Abul B.M.M.K. Islam, William F. Richter, Laura A. Jacobs, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya Published in: PLoS ONE, 2011, 6(8): e24023 Full text: http://www.plosone.org/article/info%3Adoi% 2F10.1371%2Fjournal.pone.0024023 117 Summary Here I have described the relative expression of histone modifying enzymes (HMTs and HDMs) and their targets in normal human tissues and in various cancers. Also I was keen to know if any correlation exists between HMT/HDM expression and their targets as well as among HMTs and HDMs. For this study I analyzed publicly available ChIP-seq and ChIP-on-chip data for HMTs and HDMs as well as expression data on 73 normal tissue types and 264 different cancer cell lines in 18 different tissue types. I have identified distinct expression signatures and correlation patterns among HMTs and HDMs in normal and tumor cells. Here, I first optimized large scale ChIP-seq, ChIP-on-chip and microarray data analysis pipeline. Then I used clustering algorithms and correlation co-efficient based approaches to elucidate expression signatures and relationships of HMTs/HDMs and their targets that can be used for improved diagnosis and drug design. 118 Islam AB, Richter WF, Jacobs LA, Lopez-Bigas N, Benevolenskaya EV. Coregulation of histone-modifying enzymes in cancer. Supporting information. PLoS One. 2011;6(8):e24023. Chapter 4 GENOME-WIDE ANALYSIS USING CHIP TO IDENTIFY ISOFORM-SPECIFIC GENE TARGETS Authors: Michael L. Beshiri, Abul B.M.M.K. Islam, Dannielle C. DeWaal, William F. Richter, Jennifer Love, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya. Published in: Journal of Visualized Experiments, 2010. (41), e2101; Full text: http://www.jove.com/details.php?id=2101 133 Summary ChIP-seq is a relatively new technique and even a conventional ChIP experiment is a daunting task with several steps and the use of a variety of reagents. I realized that a textbook protocol is not sufficient in most cases for explaining this experiment, rather we needed to thoroughly describe the protocol. Here I have laid out a step by step explanation of the protocol of ChIP-sequencing and also described computational approaches to analyze the genomic locations and functions of RBP2 targets. We also published a video of the ChIP-sequencing protocol in addition to the text. I, along with my colleagues, have shown how to perform ChIP experiments with precautions; and I also analyzed ChIP-seq data to identify overlapping and isoform-specific functionally-distinct target modules. 134 Chapter 5 OVERLAPPING AND DISTINCT RECRUITMENT OF ENZYMES INVOLVED IN HISTONE METHYLATION DETERMINES EPIGENETIC PATTERN IN HUMAN GENOME Authors: Abul B.M.M.K. Islam, William F. Richter, Michael L. Beshiri, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya Published in: Manuscript in preparation, April, 2012. Full text: http://bg.upf.edu/˜khademul/myPaper/ 141 Summary In this study I aimed to determine genome-wide KDM5A localization in human U937 and mouse ES cells and H3K4me3 status at these sites, as well as illustrate the properties of binding sites and correlation of binding with gene expression. Also, I wanted to improve upon our knowledge with a more detailed localization of KDM5A in the epigenetic landscape. I took part in ChIP-sequencing experiments and analyzed our ChIP-seq data as well as several public ChIP-seq and ChIP-on-chip data on various HMTs, HDMs, histone modification marks and transcription factors to find correlations and explain the mechanism of KDM5A recruitment and the functions of KDM5A at particular locations. I analyzed microarray data and classified KDM5A peaks based on distance and the presence or absence of H3K4me3 marks, and showed their relationship to expression. Besides ChIP-seq, ChIP-on-ChIP and microarray data analysis, I also optimized several downstream analysis pipelines based on the outcome of these data analysis. To delineate other histone marks at RBP2 locations as possible mechanistic explanations of KDM5A behavior in certain locations, I calculated the occurrence of various histone modification marks at our classified KDM5A peak locations. I have shown isoform-specific properties of KDM5A targets, such as their binding motif preference, sequence conservation and function. Computational approaches to identify putative TF co-occurrence at isoform-specific KDM5A sites showed how the two isoforms might form co-localize with a distinct set of TFs. To determine the functions of classified KDM5A targets and various HMTs and HDMs I did an overrepresentation analysis which identified enrichment of several interesting categories which appear to be exclusive functional properties of each class or HMT/HDM. To further characterize KDM5A binding sites, I performed de novo motif analysis and also calculated conservation score. Some of the findings were also validated by immunoblot and qPCR analysis by our group. This study is the first report on isoform-specific KDM5A location, as well as detailed genome-wide KDM5A localization in human cells in general. 142 Overlapping
and
distinct
recruitment
of
enzymes
involved
in
histone
 methylation
determines
epigenetic
pattern
in
human
genome
 Abul
B.M.M.K.
Islam1,3,
William
F.
Richter3,
Michael
L.
Beshiri3,
Nuria
Lopez‐Bigas1,2*
,
and

 Elizaveta
V.
Benevolenskaya3*
 
 1 Research
Unit
on
Biomedical
Informatics,
Department
of
Experimental
Health
and
Sciences,
PRBB,
Universitat
 Pompeu
 Fabra,
 Doctor
 Aiguader
 88,
 Barcelona
 08003,
 Spain.
 2Institució
 Catalana
 de
 Recerca
 i
 Estudis
 Avançats
(ICREA),
Barcelona,
Spain.
3Department
of
Biochemistry
and
Molecular
Genetics,
University
of
Illinois
 at
 Chicago,
 900
 S
 Ashland
 Ave,
 Chicago
 60607,
 USA.
 *Corresponding
 authors:
 nuria.lopez@upf.edu,
 evb@uic.edu
 Chromatin
provides
a
key
for
the
access
to
the
genome,
being
involved
principally
in
all
DNA‐mediated
 processes.
 Methylation
 at
 histone
 H3K4
 is
 required
 for
 transcriptional
 activation
 and
 many
 other
 associated
events.
High‐resolution
studies
of
histone
methylation
in
the
mammalian
genomes
showed
 that
 H3K4
 methylation
 is
 highly
 elevated
 at
 the
 TSSs.
 However,
 it
 is
 not
 known
 which
 enzymes,
 synthesizing
and
degrading
histone
methylation,
are
bound
to
H3K4
methylated
genomic
regions.
We
 previously
 showed
 that
 lack
 of
 KDM5A
 results
 in
 the
 whole‐genome
 increase
 in
 H3K4
 methylation.
 Here
we
performed
systematic
analysis
of
genomic
regions
enriched
in
H3K4me3
and
KDM5A.
Our
data
 suggest
that
KDM5A
is
at
the
top
of
hierarchy
of
regulation
of
histone
methylation.
First,
KDM5A
highly
 overlaps
 with
 the
 majority
 of
 TSS
 regions
 carrying
 H3K4me3.
 These
 genes
 belong
 to
 specific
 gene
 ontology
categories
such
as
RNA
processing
and
protein
transport.
However,
KDM5A
is
excluded
from
 many
regions
with
low
level
of
H3K4.
These
regions
are
more
distantly
located
from
the
known
genes
 and
associated
with
distinct
functions.
Second,
KDM5A
directly
regulates
transcription
of
histone
H3.3
 and
histone
H3
modifying
enzymes
by
binding
to
their
promoter
regions.
Analysis
of
Kdm5a
knockout
 cells
showed
that
methylation
also
changes
in
the
regions
that
are
not
bound
by
KDM5A
but
likely
to
be
 regulated
 by
 a
 demethylase
 that
 is
 normally
 repressed
 by
 KDM5A.
 Finally,
 comparison
 of
 KDM5A
 location
with
location
of
other
enzymes
involved
in
histone
methylation
shows
that
the
corresponding
 genes
 are
 contributing
 to
 relevant
 processes,
 which
 therefore
 can
 be
 defined
 as
 epigenetically
 regulated.
 
 5. R ELATIVE REQUIREMENT OF Abstract INTRODUCTION Histones are post-translationally modified by epigenetic enzymes and recruit other proteins that are able to bind DNA and further modify chromatin. Epigenetic regulation may result in dramatic phenotypic changes associated with terminal differentiation in somatic and germ cells during development, or with their reprogramming to return to pluripotency and self-renewal in induced pluripotent stem cells. Epigenetic mechanisms also contribute to aging (Collado et al., 2007) and to the development of several pathologies, including cancer (Jones and Baylin, 2007), inflammation (Foster and Medzhitov, 2009), and degenerative diseases (Wang et al., 2008). It is imperative that modifications imposed by different epigenetic enzymes are functionally linked and cross-regulated. However, while the histone code, or histone modifications that are coupled for combinatorial control have been extensively studied (reviewed by Rando et al., 2009), the co-occurrence of enzymes responsible for placing this histone code has not been addressed on the global scale. The epigenetic enzymes, in turn, have multiple protein-protein interaction domains and usually are constituents of multiple protein complexes that associate with transcriptional activators or repressors. Therefore, the location of epigenetic enzymes, when compared with histone code, will be instrumental in deciphering effects of DNA sequences and nonenzymatic activities of epigenetic factors in determining transcriptional outcome. This was recently appreciated in the study of combinatorial patterning of chromatin regulators, which included several chromatin-modifying enzymes (Ram et al., 143 KDM5A 2011). Distinct individual genomic loci (e.g., p16/ARF, HOX genes, several cell cycle and ERregulated genes) have been studied with respect to combined recruitment of lysine histone methyltransferases (KMTs) and lysine histone demethylases (KDMs). With availability of highquality high-resolution genome location analysis data, it is now possible to study correlations in locations of different epigenetic regulators and to shed light at the consequences of their combined and distinct recruitment. Correlation of these enzymes’ locations at genome scale would allow identification of distinct genomic features (e.g., transcription start sites (TSS), promoters, enhancers), transcription states (e.g., actively transcribed, repressed, poised) and whole biological processes and pathways that are subject of epigenetic regulation. Previously our lab reported KDM5A binding sites that are close to TSS in a human cell line by ChIP-on-chip study (Lopez-Bigas et al., 2008). However, correlation of binding sites with various histone modifications that cells might experience at these sites; their expression pattern; and functional correlation with other histone methyltransferases/demethylases (HMTs/HMDs) were not illustrated in detail. The direct interaction between different epigenetic enzymes, however the difference in the loci having one, two or all three enzymes recruited has not been extensively discussed. The basis of epigenetic regulation is creation of a transcriptionally ‘‘balanced’’ state, which is subjected to promptly alternating or even simultaneous control by proteins with opposing enzymatic activities. This type of regulation was first described for Polycomb group (PcG) and TrxG proteins in Drosophila (Schwartz et al., 2010). Polycomb response elements (PREs) were originally identified as sites of binding by PcG proteins. However, it was later found that TRX binds at essentially all known or presumptive PREs, whether or not they were also occupied by PcG proteins. The co-occupancy of two such opposing regulators places a gene in alternating chromatin states at two different histone residues. The bivalent marks, containing both histone H3 lysine 4 (H3K4) and lysine 27 (H3K27) methylation marks, characteristic of genes encoding developmental regulators in mammalian embryonic stem cells (Mikkelsen et al., 2007; Pan et al., 2007), is an example of the balanced state in “poised” configuration. The gene is transcriptionally silent but is getting activated or repressed upon resolution of the bivalent mark into univalent H3K4 or H3K27 methylation mark (Bernstein et al., 2006). The evidence of homeostatic regulation of histone modifications may come not only from the cooccupancy of opposing KMTs, but also of a KMT and a relevant KDM, different isoforms of the same enzyme, as well as different catalytic activities acting at the same lysine. The demethylation of histone lysine residues was first proposed for flavin-dependent amine oxidases and soon afterwards was discovered for the amine oxidase LSD1/KDM1. LSD1 acts against mono- and dimethylated peptides but cannot demethylate the trimethylated state (Shi et al., 2004). Complementing the activity of amine oxidases, another activity was proposed for α-ketoglutarate-Fe(II)-dependent dioxygenases, compatible with demethylation of all methylated states. This catalytic activity resides in the JmjC domain. Whereas LSD1 is unable to catalyze the demethylation of H3K4me3, the KDM5 family containing the JmjC domain can revert this mark. KDM5A specifically removes me3/me2 marks from H3K4. Demethylation activity of KDM5A was shown to suppress expression at a number of genes. KDM5A has at least two known isoforms that differ in a histonebinding domain (Wang et al., 2009). It is conceivable that the larger isoform (LI) of KDM5A, which contains a specific H3K4me3 interacting domain, differs from the smaller isoform (SI) in genomic location, which may account for the observed diversity in KDM5A function. Assortment of binding sites and function of bound genes of these two isoforms would add up the richness of epigenetic regulation. In addition, the co-occupancy of successive HDM activities may account for complete lack of methylation. It is unclear whether a KDM5 enzyme prepares a methylation state for KDM1A and resides on the same nucleosome, or it always occupies a distinct location. It the later case, the KDM5 and KDM1A bound nucleosomes are expected to bind different HMT complexes. KMT2A/MLL1 is generally recruited via its interaction with RNA Pol II and is involved in trimethylation of H3K4 mostly on promoterproximal nucleosomes (Ng et al., 2002; Guenther et al., 2005). Binding of PcG proteins is known to anti-correlate with the availability of RNA Pol II and highly correlate with the distribution of H3K27me3 modification in this region. In contrast to H3K4me3, which forms a sharp peak downstream TSS, H3K27me3 mark may form domains that are hundreds of kilobases long (Barski et al., 2007). KMT2A is not a functional ortholog of Drosophila TRX, and whether MLL1 and EZH2 can coexist at some loci in mammalian April
2012
 144 cells would require a careful genome-wide analysis. Previous genome-wide studies using ChIP-seq have revealed occurrence of H3K4me3 mark at regulatory elements (Barski et al., 2007; Mikkelsen et al., 2007). This suggested that histone lysine methylation plays a role in the regulation of pathway-specific genes. This concept is supported by the fact that genes involved in the development form bivalent domains in embryonic stem cells. The enzymes responsible for these two modifications have the opposite regulatory effect and appear to concur a dominant role in development and differentiation. Importantly, the catalytic activity of these enzymes has been shown to be critical for the observed effects. Overexpression of KDM5 enzymes results in the lack of H3K4me3, and KDM5A/JARID1A/RBP2 loss results in a global increase in H3K4me3 suggesting about its major role in demethylating H3K4me3. Similarly, RNAi-mediated inhibition of the KMT6/EZH2 result in a global decrease in the levels of di- and trimethylation H3K27 at its target genes and transcriptional derepression (Margueron et al., 2008; Shen et al., 2008). Several lines of evidence reveal an extensive interplay between methylation at K4 and K27 of histone H3. KDM6 enzymes, namely, KDM6A/UTX and KDM6C/UTY, counteract EZH2 action. Strikingly, EZH2 and UTX were found in the same complex bound to chromatin, which also contained KMT2B/MLL2. In this study first we systematically characterized the locations of KDM5A, study the correlation with expression and effects on histone H3K4 methylation. We also have performed ChIPseq analysis of KDM5A substrate H3K4me3 in presence or absence of it to identify the gene signature that changes/dependent on the KDM5A demethylase activity. Surprisingly, we also detected genomic locations that are indirect regulation of KDM5A for maintaining methylation pattern. Interestingly, H3K4me3 regions that are apparently indirectly regulated by KDM5A and regions that are not regulated by KDM5A, are located in distinct locations but enriched for similar functions. Next, we compared its locations with the locations of other KDMs as well as KMTs to find the linkages between H3K4 and H3K27 modifying enzymes. And we studied whether their exclusive and common targets served a particular biological function. KDM5A associates with TSS of actively transcribed genes both in embryonic stem cells and differentiated cells KDM5A has shown to be important for cell differentiation control through epigenetic mechanisms (Lopez-Bigas et al. 2008). Because differentiating Kdm5a knockout (KO) cells displayed defects in gene expression specifically in genes that normally have KDM5A bound to their TSS (Beshiri et al., 2012, our submitted manuscript), we decided to focus on KDM5A location in differentiated cells. U937 hystiocytic lymphoma cells were differentiated to monocytes/macrophages with TPA for 96 hr and chromatin immunoprecipitations (ChIP) was performed followed by next-generation sequencing. We previously showed that in mouse ES cells, KDM5A is preferentially located at the TSS regions (Beshiri et al., 2012, our submitted manuscript). We also identified promoter regions bound by KDM5A at different stages of cell differentiation using ChIP-on-chip (Lopez-Bigas et al. 2008). When we aligned KDM5A reads around 3Kb regions from TSS, we found that large number of reads are located close to TSS in both human and mouse cells (Figure 1A). Consistently, a large number of enriched peaks were also detected in proximal promoter region compared to total genomic ChIP as background (Supplemental Figure S1-A). A notable difference from our previous ChIP-on-chip study on KDM5A targets (Lopez-Bigas et al 2008) is that because here in ChIP-seq we covered the whole genome, we were able to detect a large number of peaks that are not close to TSS (Supplemental Figure S2-A). KDM5A peaks were found depleted from transcription termination sites (TTS), probably due to depletion of nucleosome (Supplemental Figure S2-B). When we clustered reads at TSS, we observed KDM5A presents three major clusters (Figure 1A). One cluster showing high density reads around TSS, another cluster with intermediate density and a last one with depletion of KDM5A. When we aligned median centered normalized expression values of these clustered genes, interestingly, expression showed a good correlation with read densities. This data suggest that KDM5A binds to highly expressed genes near TSS. When we calculated and compared the profile of top 5% expressed and bottom 5% expressed genes at the TSS, similar to the above, we found that KDM5A makes higher peaks at the TSS of top expressed genes (Supplemental Figure S2-B). Comparison to our study performed at April
2012
 5. R ELATIVE REQUIREMENT OF RESULTS 145 KDM5A mouse ES cells showed overlap in 79% of KDM5A targets despite analysis in mouse versus human, and the different cell types and antibodies used. Both in human and mouse, KDM5A peaks were significantly enriched (p-value ≤ 1E-16) at bidirectional promoter (at 1 kb and 2.5 kb range, Supplemental Figure S2-C; and list of genes with statistical details in Supplemental Excel file Table S2). To study the expression properties of all peaks, we first classified KDM5A peaks in mouse ES cells based on their genomic location distance from TSS (see methods for classification details). Next, we calculated the average absolute expression (Lin et al. 2011) of target genes in each class (Figure 1B). We observed that target genes with KDM5A peaks at TSS and promoter are significantly more expressed than taget genes with distal, gene-body or downstream peaks. Above all, TSS binding peaks showed significantly higher expression compare to any other groups. We did not find significant difference in absolute expression of targets within the groups for wild type (f/f) or Kdm5a KO condition. However, when we compared differential expression of KO vs. f/f, we found significant differences in expression when cells were prompted for differentiation (Figure 1C). Significant differences in expression were evident in all peak groups. This indicates that KDM5A negatively regulates genes involved in differentiation and related processes. For list of genes in each group and their expression, please see Supplemental Excel Table S3. Recruitment of KDM5A to genes in nucleolus shows ribosomal function relation Our previous study on human (Lopez-Bigas et al. 2008) and the study of Peng et al. (2009) on mouse, both found that KDM5A targets are enriched for gene-ontology (GO) categories RNA processing/RNA metabolism, ribonucleoprotein complex, RNA polymerase complex, transcription initiation, etc related to RNA modification and ribosomal function. In tumor cells, the ability to proliferate highly relies on ribosome biogenesis activity, and the nucleolus is the main center regulating this and other biological processes, including cellular senescence, cell cycle control and stress response (Biosvert et al. 2007). Interphase centromeres appear to localize around and within nucleoli (Ochs and Pres 1992) and centromere autoantigens are also associated with the nucleolus. Therefore, we asked, how many of our KDM5A targets (human, U937, AI) are located in the nucleolus. The first global identification of nucleolus-associated genes has recently been reported by Nemeth et al. (2010). We observed that 91 out 620 nucleolar genes are KDM5A targets (p value=0.02) (Figure 2A). Interestingly, a large number of KDM5A targets (26 genes) are clustered in chromosome 2 and some targets are clustered in chromosome 19 (16 genes). Expression analysis of KDM5A target genes in nucleolus, targets in non-nucleolus and all other genes suggests that both categories of KDM5A targets have significantly higher average expression than non-targets (Figure 2B). (For detail target genes and their expression values, see the Supplemental Table S4. Gene ontology (GO) analysis confirmed the enrichment of KDM5A target localization to nucleolus and transcription related functions (data not shown)). Characterization of isoform-specific binding sites revealed distinct properties of KDM5A The KDM5A protein is expressed as a small and a large isoform that has extended HK4me3recognition domain from the C-terminus. Overexpression of the KDM5A C-terminal domain as a part of an oncogenic fusion resulted in leukemia due to enriched recruitment to H3K4me3 regions and prevention of H3K4me3 removal from multiple genes, including the key developmental genes (Wang et al., 2009; van Zutven et al., 2006). KDM5A contains a sequence-specific DNA binding domain ARID and also potentially can be recruited by other transcription factors (TFs) through interactions with PHD, C5HC2 or LXCXE domain. To identify KDM5A targets that do not require H3K4me3-interacting KDM5A domain, we performed ChIP-seq analysis in U937 cells differentiated to monocytes/macrophages expressing both isoforms of KDM5A: small and large (LI). We refer to both protein isoforms as “all isoforms” (AI). (However, unless specifically mentioned, results described here are based on AI peaks, not large LI). Structure analysis of KDM5A shows that both isoforms contain the catalytic JmjC domain responsible for demethylation of H3K4. The defining difference between the two isoforms is in the number of PHD domains that each has (two in the large isoform and one in the small). PHD domains are found in several chromatin-binding epigenetic regulators. They are able to bind to methylated histones. Recently, the study of Goldberg et al. (2010) showed that PHD3 of the LI is able to bind directly to trimethylated H3K4 and is sufficient for recruitment of the oncogenic fusion protein to genes containing this mark (Wang et al., 2009; van Zutven et al., 2006). April
2012
 146 A B C A B Log2 Expression non KDM5A targets KDM5A KDM5A nucleolus nonnucleolus targets targets Figure 2: Recruitment of KDM5A targets in nucleolus and expression: (A) Chromosome wise (Circos plot, http://circos.ca/) plot of KDM5A targets in nucleous. Total 110 (U937, AI) peaks corresponding to 91 unique genes are plotted along the chromosome. The orange color bar denotes the position of the KDM5A peak in the nucleolus. The height of each peak corresponds to absolute expression of each gene (expression value of 76 genes analyzed in microarray). Circular inside blue color axis is 1 Log2 expression distance. Chromosomes where no KDM5A is present are not shown. (A) Normalized Log2 absolute expression of nucleolus KDM5A targets (76) is presented in comparison to non-target gene (12239) expression and non-nucleolus KDM5A target gene (6529) expression and plotted in box-plot. Wilcoxon rank sum test with continuity correction showed that non-nucleolus KDM5A target genes had significantly higher expression (p value < 2.2e-16) than non-targets. The nucleolus-specific targets had slightly higher expression (p value 0.02416) than non-targets. 147 KDM5A Figure 1. KDM5A binding and expression correlation: (A) Heatmap representing density of KDM5A from ChIPseq aligned reads in U937 cells all isoform (AI) and long isoform (LI), and from mouse ES cells with respect to TSS of RefSeq genes in a window of 3Kb up- and downstream. Total normalized average numbers of reads in 25 bp bins are plotted in a heatmap. Each row represents an RefSeq transcript. Average signal of AI at TSS are clustered and then for LI and mouse ES cells KDM5A genes are sorted in same order. Median centered (Log2) expression of genes was also represented in color-coded heatmap. (B) Expression of distance based KDM5A target groups. KDM5A peaks in mouse ES cells are classified based on its distance from TSS (see methods for details). Target genes absolute expression (Log2) (upper) in mouse cells with WT KDM5A (f/f) in undifferentiated condition and differentiated cell condition, and mouse cell with KDM5A KO condition in undifferentiated and differentiated cell condition represented in box-plot. (bottom figure) Similarly, differential expression (Log2) of KO vs. f/f in undifferentiated and differentiated condition has plotted in box-plot for these group's target genes. 5. R ELATIVE REQUIREMENT OF Based on all of this information, we hypothesized that the two isoforms may be recruited to different target genes due to the presence and absence of PHD3. To distinguish the two KDM5A isoforms, we generated two antibodies, one detecting both KDM5A isoforms (or “all isoform”, AI) through a common central domain, and another detecting only large isoform (LI) but not the small isoform (SI), through a C-terminal PHD domain (Beshiri et al., 2010). By ChIP-seq data analysis with these two antibodies we found 7216 enriched regions for AI, and 3662 for LI, with a 64.6% common peaks of these isoforms (Supplemental Table S5). We then related the detected peaks to the nearest gene as target, finding also important overlap on the target genes between the two antibodies (Figure 3A). Consistent with findings that KDM5A binds to higher expressed genes, here, both AI and LI showed significantly (p-value < 2.2e-16) higher expression compared to all genes in the microarray (Figure 3B) as background or even compare to non-KDM5A targets (data not shown). Interestingly, expression of all AI and only-AI targets differs significantly (p-value 0.00873). However, targets that are common to both isoforms showed the highest expression, and this expression is also significantly different from AI (p-value 0.002667). Although apparently common targets show higher average expression, it is not significantly different than LI targets. The 966 target genes that comprised the only-LI portion are relatively less expressed. In general binding sites of transcription factors that are involved in housekeeping functions are believed to remain conserved (She et al., 2009). Also, cross species conservation can be an indicator of functional genomic elements, for example, distal regulatory elements like enhancers (Roh et al., 2007). Therefore, we calculated nucleotide average conservation score (CS) of total 100bp around peak summit (Supplemental Table S6). However, to plot the CS, we extended this score to 500bp up and 500bp down from peak summit (Figure 3C). When we compare the CS of KDM5A peaks, we find that in general they are much more conserved, except only-LI, compared to random sequences. Also peaks close to the TSS are more conserved than distant peaks, although some distant peaks showed markedly high CS. However, there is no significant variation in relative CS for AI and LI (Supplemental table S6). High CS of KDM5A was also observed in mouse peaks (Supplemental table S6). De novo search for motif overrepresentation within 100bp peak summit revealed that KDM5A (both AI or only AI) has preference for CGCCG/A sequence. Among onlyAI peaks, 48% found to contain this GC rich motif. However, with only LI peaks, surprisingly we found that AT/GT rich motif were overrepresented (68%) (Figure 3D). However, it was also previously reported that JARID1 family binds to GC-rich DNA, and the study of Tu et al. (2008) found that ARID domain of KDM5A binds preferentially to CCGCCC motif. However, we have found that a large proportion of KDM5A peaks are located in the CpG islands (Supplemental Table S5) (57.26% only-LI, and 86.73% of common peaks of AI and LI), which may explain the detection of a GC rich motif. This observation also put the question that probably this motif is a property of CpG islands, instead of a specific property of KDM5A peaks. In our analysis, we did not find that KDM5A peaks outside the CpG islands contain significantly more CCGCCC or CGCCG motif sequence than peaks in the CpG islands (about 15% in non-CpG peaks, versus about 80-91% in CpG peaks; see Supplemental Table S7 for details). Having identified targets of only-AI peaks (corresponding to SI specific peaks) and common targets of AI and LI, we asked if they have differences in their biological functions. We used ChIPpeakAnno program (Zhu et al. 2010) to annotate closest EnsEMBL (version 54) genes from peak start and bionomial statistics in Gitools (Perez-LIamas and Lopez-Bigas, 2011) to identify significantly overrepresented (FDR ≤0.05) gene ontology biological processes terms. Interestingly we found that both group has group-specific functions aside from overlapping functions (Figure 3E). However, consistent with previous findings (Lopez-Bigas et al., 2008; Peng et al., 2009), both of the group targets function in RNA processing and mitochondrial electron transport. Owing to the presence of C-terminal PHD domain, KDM5A can interact with several other proteins including p107 (Kim et al., 1994), Rhombotin-2 (Mao et al., 1997), nuclear receptor (Chan and Hong, 2001), Myc (Secombe and Robert, 2007), Mad1 (Ge et al., 2010) and Sin3/HDACs (Ovelene et al., 2008). Due to its role in differentiation, it is probable that KDM5A is recruited by several other transcription factors (TFs) to the target genes. It may form complexes with other TFs, or it may interact with chromatinremodeling complexes. To determine putative TFs co-occurrences in KDM5A peak locations, we scanned the 1000bp sequences around the summit of AI and LI peaks using TRANSFAC database position weight matrices (PWM). To find April
2012
 148 A KDM5A AI targets KDM5A LI targets D 4138 2482 966 B AI C all AI peaks AI peaks with at least 0.5 CS 5. R ELATIVE REQUIREMENT OF all LI peaks LI peaks with at least 0.5 CS Figure 3: Isoform specific properties of KDM5A binding sites: (A) Venn diagram showing overlap (at least one bp) between significantly enriched U937 cells KDM5A all isoform (AI) and long isoform (LI) peaks. (B) Based on overlap of U937 cells AI and LI enriched peak's genomic location (at least overlapped by one bp) peaks are grouped and absolute (Log2) expression of corresponding genes for each group as well as all genes analyzed in the microarray are represented in box-plot for comparison. (C) Conservation score of KDM5A peaks. (Left panel) Conservation score (CS) in placental mammals (PhastCons, hg18, UCSC) for each nucleotide of 500bp up and downstream from peak summit of U937 cells KDM5A peaks were calculated and average conservation score in each 10bp bin for all peaks in each peak group has represented in the graph. For comparison, CS of random sequence of equal length in 5Kb apart of each peak were calculated and average CS in each 10bp bin also plotted. (middle and right panel) Average conservation score of 100bp from peak summit were calculated for each peak in AI and LI. CS was plotted (Y axis) against peak distance from TSS (left panel) (X axis). In right panel, we plotted only those peaks that are assumed to be more conserved (CS >= 0.5). Plot showed clustering of majority of conserved peaks relatively close to TSS. (D) KDM5A binding sequence motif in AI and LI. Using Weeder program, identified top sequence motif for AI, only-AI and only LI peak's 100bp summit region presented in logo format. Nucleotide that is more probable in same position gets higher bit compare to other that is less probable in same position. 149 KDM5A only AI overlap AI, LI (E) Biological Process significant 0.0 FDR p-value non-significant 0.05 1.0 GO:0031145 anaphase-promoting complex-dependent proteasomal ubiquitin-dependent protein catabolic process GO:0051436 negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle GO:0006364 rRNA processing GO:0051028 mRNA transport GO:0000387 spliceosomal snRNP biogenesis GO:0006120 mitochondrial electron transport, NADH to ubiquinone GO:0006418 tRNA aminoacylation for protein translation GO:0006446 regulation of translational initiation GO:0006810 transport GO:0006297 nucleotide-excision repair, DNA gap filling GO:0006302 double-strand break repair GO:0006139 nucleobase, nucleoside, nucleotide and nucleic acid metabolic process GO:0006413 translational initiation GO:0000245 spliceosome assembly GO:0006891 intra-Golgi vesicle-mediated transport (F) GO:0000718 nucleotide-excision repair, DNA damage removal GO:0000724 double-strand break repair via homologous recombination GO:0006270 DNA replication initiation GO:0042254 ribosome biogenesis Z-score GO:0030163 protein catabolic process GO:0010165 response to X-ray 2.0 6.0 GO:0000079 regulation of cyclin-dependent protein kinase activity GO:0007093 mitotic cell cycle checkpoint AI LI GO:0006614 SRP-dependent cotranslational protein targeting to membrane GO:0015992 proton transport GO:0006289 nucleotide-excision repair GO:0006904 vesicle docking during exocytosis GO:0006303 double-strand break repair via nonhomologous end joining GO:0009060 aerobic respiration GO:0006406 mRNA export from nucleus GO:0000209 protein polyubiquitination GO:0001522 pseudouridine synthesis GO:0007126 meiosis GO:0010212 response to ionizing radiation GO:0030521 androgen receptor signaling pathway GO:0006626 protein targeting to mitochondrion GO:0007059 chromosome segregation GO:0006464 protein modification process GO:0007264 small GTPase mediated signal transduction GO:0000059 protein import into nucleus, docking GO:0006099 tricarboxylic acid cycle GO:0016254 preassembly of GPI anchor in ER membrane GO:0006468 protein amino acid phosphorylation GO:0016481 negative regulation of transcription GO:0055114 oxidation reduction GO:0045454 cell redox homeostasis GO:0006333 chromatin assembly or disassembly GO:0006916 anti-apoptosis GO:0006979 response to oxidative stress GO:0043123 positive regulation of I-kappaB kinase/NF-kappaB cascade GO:0005975 carbohydrate metabolic process GO:0032313 regulation of Rab GTPase activity GO:0000082 G1/S transition of mitotic cell cycle GO:0006164 purine nucleotide biosynthetic process 10 COUPTF_Q6 DR1_Q3 FOXD3_01 ICSBP_Q6 NFKAPPAB65_01 SATB1_01 SRF_01 TAL1ALPHAE47_01 FOXJ2_01 HMX3_02 RNF96_01 RSRFC4_01 MIZF_01 E2F6_01 E2F_Q6 E2F1_Q4_01 FREAC7_01 FOXP1_01 PBX1_04 VSX1_01 VAX2_01 EVI1_06 GATA6_01 E2_01 MOX1_01 NKX12_01 ISX_01 PAX4_04 MEF2_04 AREB6_04 FOXO1_Q5 HOXA13_02 HOXA4_Q2 TFE_Q6 Figure 3: Isoform specific properties of KDM5A binding sites: (E) Comparison of enriched GOBP terms for isoform-specific genes. FDR (0.05) corrected over-represented GOBP categories from only-AI and only-LI targets are presented in a color coded heatmap. Color scale represents corrected p-value. Color towards red indicates more significant while color towards yellow indicates less significant, and gray represent non-significant p-value. (F) Differences in transcription factor (TF) binding pattern in KDM5A AI and LI peak regions. TF cooccurrence in 500bp up and down-stream sequences from KDM5A AI and LI peak summit were determined by scanning TRANSFAC database position weight matrix. Significantly bound (FDR < 0.05) TF compare to average 1000 random sequence has been represented in color coded Z-score heatmap. Color towards red indicates higher Z-score, while color towards yellow indicates lower Zscore, green color represents non-significant TF binding. Since there is a higher overlap of AI and LI peaks, there are many TF that significantly bound in both AI, and LI, however, selected TFs that shows differential pattern of binding in AI vs. LI mainly represented in this image. For full list of TF and statics, refer to Supplemental file Table S8. 150 Since KDM5A functions as a H3K4me3 demethylase, it is important to know which genes are likely to be regulated by KDM5A through its demethylase activity. To gain more insights on the requirement of KDM5A function in mouse and human cells, we used next-generation sequencing technologies to determine the changes in H3K4me3 in cells with genetic ablation of Kdm5a. We have performed ChIP-seq of H3K4me3 in presence or KO of Kdm5a in mouse ES cells. Using the program Bowtie (Langmead et al., 2009) we aligned reads against mouse reference genome (UCSC, mm9). Not only KDM5A in KDM5 family can play demethylase activity, but also other members have H3K4me3 demethylating capacity. Using qPCR technique previously we showed that in the absence of KDM5A, its homolog KDM5B/PLU1 can be recruited to some of the KDM5A locations (Islam et al. 2011). Moreover, since Kdm5a-/- in mice is not lethal, it raised the question if KDM5A plays nonredundant role. Consistent with our previous report (Beshiri et al., 2012, our submitted manuscript), in the absence of KDM5A we found a global increase of H3K4m3 signal around KDM5A peaks (Figure 4A). We used SICER (Zang et al., 2009) program, which has specifically developed for the detection of histone modifications, and detected April
2012
 151 KDM5A KDM5A perform non-redundant activity of H3K4me3 demethylation, and suppresses gene expression 5. R ELATIVE REQUIREMENT OF statistical significance (FDR ≤0.05), we calculated Z-score and p-values comparing TFs occurrences in random sequences. Surprisingly, we found a remarkable difference of TFs co-occurrences in AI and LI peak regions (Figure 3F). However, presence of E2F1 in both AI and LI peaks indicates their function in cell proliferation, which is consistent with our previous observation that KDM5A is recruited to E2F targets (Beshiri et. al., 2012, our submitted manuscript) and enriched for cell cycle genes. Interestingly, HOX TF is only significantly enriched in AI peak regions. On the other hand, NFκB is enriched in LI peak locations. Using similar approach, we detected HOXA4 binding in mouse peak regions. For full list of TFs and statistical details see Supplemental Table S8. Finding distinct putative TFs from AI and LI also partially explain the inability of our AI antibody to detect all LI locations. Probably some of the LI form distinct complexes and this could make the AI antibody inaccessible. 33,824 and 27,797 significant islands of H4K4me3 in f/f (WT) condition and KDM5A KO condition respectively using total genomic DNA (“input”) ChIP-seq as background. We have also identified islands where H3K4me3 signal was significantly different between KDM5A KO and WT (increased (11,717), decreased (6,999) or unchanged (22,135)). To understand the functional relation of KDM5A and H3K4me3, we have grouped KDM5A/H3K4me3 peaks based on presence (peaks overlap by at least one base) or absence of it and represented 11 such groups in Venndiagrams in Figure 4B. Despite of using different antibody and different algorithm of peak detection (SICER vs. MACS) we found a high overlap of our H3K4me3 (f/f) peaks with that from the study of Creyghton et al. (2010) (of 87.5% peaks) and with that from the study of Goldberg et al. (2010) (of ~68% peaks). When we analyzed raw data from these two studies using the same program (MACS) and using the same parameters we found that 61.65% of H3K4me3 peaks from Goldberg study overlap with that of Creyghton study. Also for KDM5A, we observed a high overlap of our peaks with those in the study of Peng et al. (2009). Considering these overlap analyses, we can assume the high specificity or our antibody and that our analysis is performing well. Also, we found a high overlap of H3K4me3 analyzed in mouse ES cell in the study of Mikkelsen et al. (2007) (68.96% of all Hidden Markov Model, HMM interval peaks, and 65.85% window-interval peaks after mouse genome coordinate conversion from version mm8 to our version mm9) even with the use of different antibody and completely different mapping and peak calling algorithm. When we compared the expression of KDM5A group: with no H3K4me3 overlapping in any conditions (G4), with differentially increased H3K4me3 (G7) and unchanged H3K4me3 (G9), as expected, we found that G7 has highest absolute expression (Figure 4C). Recently DiTacchio et al. (2011) described KDM5A-mediated increase of histone acetylation by inhibiting deacetylase in the circadian clock. KDM5A can form a complex with the CLOCK-BMAL1 transcription factor and is recruited to the PER2 promoter, augmenting transcription by inhibiting histone deacetylase function in a demethylase-independent manner. Here, interestingly, we observed that a large number of peaks (1224) in G4 neither overlap with H3K4me3 and nor comparatively highly expressed, which probably indicates a demethylation-independent function of KDM5A. The observation that in Kdm5a-/- cells G7 have significant higher expression (p-value 3.55e-9) A B H3K4me3, f/f (33824) KDM5A (3093) G1 (1781) H3K4me3, f/f (33824) KDM5A (3093) Increased H3K4me3, KO vs. f/f (11717) (14348) KDM5A G10 G6 G2 (89) (1692) G4 G3 (1224) G7 (17695) (8322) G5 (88) (1237) G11 H3K4me3, KO (27797) Decreased H3K4me3, KO vs. f/f (6999) KDM5A Unchanged H3K4me3. KO vs. f/f (22135) KDM5A G8 (29) G9 (593) C Figure 4: KDM5A and H3K4me3 enriched peaks, and their grouping and expression: (A) Average profile (read density) of KDM5A in wild-type cells and of H3K4me3 both in wild-type (f/f) and Rbp2-/- (KO) cells. The average profile is presented 2Kb up and down-stream from the summit of KDM5A peaks. (B) KDM5A peaks are grouped based on overlapping H3K4me3 peaks in Kdm5a f/f (wild type, WT) and knockout (KO) conditions. Overlapping is represented in venn diagram and each compartment is given a group name (G1-G11). In G7-G9, H3K4me3 are differentially detected in comparison of f/f vs. KO. (C) Expression of KDM5A target genes in the groups G4, G7 and G9. (Left panel) target genes absolute expression, and (right panel) differential expression represented in box plot. 152 compared to undifferentiated condition when prompted for differentiation (Figure 4C), confirm the H3K4me3 mediated role of KDM5A in differentiation. However, differences of expression between G9 and G4 (Figure 4C) are not significant in either differentiated or undifferentiated condition, while comparison of G9 to G7 is significant. KDM5A targets with overlapping KDM5A/H3K4me3 peaks showed different functional pattern than H3K4me3 targets without KDM5A overlap After forming several groups of peaks based on overlap or non-overlap of KDM5A and H3K4me3 (Figure 4B), one of the most interesting questions is what pathway (Figure 5A) or functions (Figure 5B) each group is enriched for. We used Gitools (Perez-LIamas and Lopez-Bigas, 2011) to perform functional and pathway overrepresentation analysis (FDR 0.001). Surprisingly we observed that peak groups of KDM5A with overlapping H3K4me3 (G1, G2, and G7), and H3K4me3 groups without overlapping KDM5A (G5, G10, G11) show an almost mutually exclusive enrichment pattern of pathways and GO biological processes. Interestingly, aside from known functions of KDM5A, G1, G2 and G7 were enriched for several disease pathways. Besides cell transcription, cell proliferation and related functions, G5, G10, G11 groups were enriched for several differentiation and developmental processes. For example, in G10 and G11, 145 and 220 target genes respectively belong to “cell differentiation” out of 498 genes annotated with this term (Figure 5B). Similarly, 57 and 49 genes in G10 and G11 respectively are annotated with “heart development” term, out of 140 genes annotated in this GO category. Combination of several epigenetic marks determines the transcriptional fate of a location H3K4me3 is generally associated to “active” locations. However, locations also having H3K27me3 marks besides H3K4me3 will remain transcriptionally “poised” (Barski et al. 2007, Bernstein et al. 2006). Therefore, transcriptional outcomes are viewed as the combined effect of several epigenetic modifications. In order to understand the functional properties and expression pattern showed by different KDM5A/H3K4me3 groups (Figure 4B), we analyzed occurrence of several histone marks and TFs using publicly available ChIP-seq data (see Supplemental Table 1) at these enriched peak locations. First we compared intensity of these signals within each group individually (Figure 6A). Here we find that G1 and G7 show the highest level of activation mark by H3K27ac. Also G9 showed considerably high H3K27ac signal around peak center. However, none of the groups that we compared (G1, G3, G4, G7, G9) showed significant enrichment of suppressive H3K27me3 marks or enhancer marks H3K4me1. However, in general we observed that KDM5A peaks within 3Kb from TSS are not enriched from H3K4me1, but peaks 10Kb away from TSS showed higher signal from H3K4me1, H3K27ac and p300 (data not shown). Next we compared groups (G4, G7, G8, G9) for each signal individually. We calculated average read numbers of those histone and TFs marks in each 50bp bins around the center of peaks in each group (Figure 6B). This data showed that comparatively KDM5A peaks with no H3K4me3 overlapping in any conditions (G4) have less KDM5A signal at peak center compared to the other three groups. Also, H3K27ac is highest in G7 – a group where we detected differentially increased H3K4me3. Similar pattern was observed for Pol II. As we have seen before, none of the group showed suppressive H3K27me3 marks (data not shown). However, unexpectedly, although we did not find higher H3K4me1 enhancer signal, we have seen a considerable signal for p300 in these groups except in G4. Comparison of KDM5A and H3K4me3 signal at TSS for peak groups We previously reported that KDM5A and H3K4me3 bind close to the TSS (Lopez-Bigas et al. 2008). Peng at al. (2009) had similar observation. Here we compared the signal of KDM5A (Figure 6C) and H3K4me3 (Figure 6D) for the peaks in different groups at TSS of EnsEMBL genes. These result show that Differentially increased H3K4me3 regions (G7) also has highest average KDM5A signal at TSS (Figure 6C) and on the contrary, G3 and G4 had very week signal at TSS. This means that G4, the KDM5A peak group that does not have H3K4me3, is not close to TSS. Similarly, H3K4me3 profile among different groups at TSS showed that G1 and G7 have highest H3K4me3 at TSS. However, interestingly, we observed that G10 and G11 (differential H3K4me3 regions in KDM5A KO vs WT that are not bound by KDM5A) are not close to TSS, although G5 (H3K4me3 regions independent on KDM5A that are not bound by April
2012
 5. R ELATIVE REQUIREMENT OF 153 KDM5A Figure 5A K w DM ith 5 ov A p er ea la k pp gr H in ou 3K g ps w ith 4m H 3K no e3 4m ov pe e3 ak er la g pp ro up in g s K D M 5A significant 0.0 non-significant 0.001 1.0 G1 G2 G7 G5 G10 G11 KEGG pathways Metabolic pathways MAPK signaling pathway Wnt signaling pathway Neurotrophin signaling pathway VEGF signaling pathway Fc epsilon RI signaling pathway Vascular smooth muscle contraction Adipocytokine signaling pathway Leukocyte transendothelial migration Hypertrophic cardiomyopathy (HCM) Focal adhesion Regulation of actin cytoskeleton Axon guidance Basal cell carcinoma Melanogenesis Endocytosis ECM-receptor interaction Cardiac muscle contraction Glioma Colorectal cancer Prostate cancer Alzheimer's disease Huntington's disease Parkinson's disease Oxidative phosphorylation Aminoacyl-tRNA biosynthesis Spliceosome Proteasome 154 G1 G2 G7 G5 G10 G11 K w DM ith 5 ov A p er ea la k pp gr H in ou 3K g ps w ith 4m H 3K no e3 4m ov pe e3 er ak la g p p ro in up g s K D M 5A B Biological Process GO:0006810 transport GO:0006350 transcription GO:0007275 multicellular organismal development GO:0030154 cell differentiation GO:0006811 ion transport GO:0007155 cell adhesion GO:0045449 regulation of transcription GO:0006468 protein amino acid phosphorylation GO:0007399 nervous system development GO:0007242 intracellular signaling cascade GO:0006355 regulation of transcription, DNA-dependent GO:0006813 potassium ion transport GO:0007411 axon guidance GO:0016055 Wnt receptor signaling pathway GO:0006816 calcium ion transport GO:0008283 cell proliferation GO:0006915 apoptosis GO:0035023 regulation of Rho protein signal transduction GO:0045941 positive regulation of transcription GO:0001666 response to hypoxia GO:0009887 organ morphogenesis GO:0030182 neuron differentiation GO:0007264 small GTPase mediated signal transduction GO:0007169 transmembrane receptor protein tyrosine kinase signaling GO:0030036 actin cytoskeleton organization GO:0008285 negative regulation of cell proliferation GO:0001525 angiogenesis GO:0007507 heart development GO:0008284 positive regulation of cell proliferation GO:0002053 positive regulation of mesenchymal cell proliferation GO:0042472 inner ear morphogenesis GO:0015031 protein transport GO:0016192 vesicle-mediated transport GO:0006397 mRNA processing GO:0008380 RNA splicing GO:0006396 RNA processing GO:0008033 tRNA processing GO:0006412 translation GO:0022900 electron transport chain GO:0051028 mRNA transport GO:0065002 intracellular protein transmembrane transport GO:0006457 protein folding GO:0006364 rRNA processing GO:0042254 ribosome biogenesis 5. R ELATIVE REQUIREMENT OF Figure 5: Over-representation of GO categories and pathways for targets of different group of KDM5A peaks. Mouse ES cells KDM5A peaks were grouped based on presence or absence of H3K4me3 or differentially detected H3K4me3. Target genes from these groups were subjected to enrichment analysis. Over-representation of pathways (KEGG database) (A) and gene ontology biological process (GOBP) categories (B) (corrected p-value < 0.001) are presented in color-coded heat-map. Color scale represents corrected p-value. Color towards red indicates more significant while color towards yellow indicates less significant, and gray represent non-significant p-value. 155 KDM5A KDM5A) has increased signal at TSS. Interestingly, although the majority of the G10 and G11 are far from TSS (Figure 6D left panel), while G5 are close to TSS, they are enriched for similar functional patterns (Figure 5). One of the interesting observations that came out of this grouping is that a large number of H3K4me3 regions are only detectable in Kdm5a KO condition (G11). This suggests that G11 regions are induced/regulated by other HMTs/HDMs, which are directly or indirectly regulated by KDM5A. This favors our hypothesis that HMTs and HDMs are co-coordinately controlled and overlapping or distinctly recruitment of these enzymes determines the epigenetic patterns. Since peaks in G10 and G11 are located in distal regions, and G5, G10 and G11 have similar functional pattern, next we asked if they are enriched for enhancer markers. Recent studies (Creyghton et al., 2010; Rada-Iglesias et al., 2011) showed that enhancer regions typically contain increased signal of H3K4me1 and P300. Active enhancers can be separated by the presence of H3K27ac from the poised enhancers that show the mark of H3K27me3. Therefore, we analyzed these signals at the peak center of G5, G10 and G11 (Figure 6E). We found that in general these regions show increased levels of H3K27ac around the center of the H3K4me3 peak, however G5 levels are much higher than G10 and G11. On the other hand, only G10, and G11 show increased signal for H3K4me1 at peak center, but not G5 –a group where peaks are mostly close to TSS. In G10 and G11 we have also observed low levels of p300 marks. In contrast to the enhancer regions as defined in the study of Creyghton et al. (2010), where author found depletion of H3K4me3 from enhancer regions, we observed enhancer mark H3K4me1, P300 and H3K27ac in our peaks in G10 and G11 that are actually enriched for H3K4me3, although this mark is present in relatively low level. However, this finding is consistent with recent study of Pekowaska et al. 2011, which claimed that H3K4me3 provides an epigenetic signature of active enhancers and basically by the presence of a pattern of all forms of H3K4me methylation (i.e. H3K4me1/2/3), cellular physiological and developmental contextdependent enhancers can be determined. However, when we analyzed the average absolute expression of G5, G10 and G11, we found that G5 is significantly above the average expression of all genes (Figure 6F). G10 and G11 expression remain similar to average expression level of all genes. Expression enhancement of target genes by enhancer depends on stage of development and probably these regions remain open to be further activated with the initiation of development (as we see developmental genes enriched), but not “poised” as we did not observe suppressive H3K27me3 signal. Enriched cell functions among genes targeted by five studied epigenetic regulators Several studies have provided a mechanistic link between epigenetic factors and tumor suppressor. Our prior work revealed that Myc target promoters correlate positively with an active H3K4me3 signature and negatively with a repressive H3K27me3 signature. We hypothesize that through connection to oncogenes and tumor suppressors, epigenetic regulators cooperate on the target genes. As the first step we query how many of peaks in each of total 5 HMTs and HMDs we tested (Supplemental Table S1 for data sets) overlap with one another. For peak overlapping we used BEDTools (Quinlan et al., 2010) and considered “overlap” when at least one bp between two peaks finds overlap. To find target genes we used nearest gene from start of the peak approach using the Bioconductor package ChIPpeakAnno (Zhu et al., 2010). We observed that, except for KDM5A and MLL1, peaks are not highly overlapping (Figure 7A). However, target genes between two HDM/HMT find considerable overlap (Figure 7A). This suggests that although two HDM/HMT may not bind closely in the genome, they coordinately regulate the same genes, probably for same functions. Among 7266 KDM5A (“AI”) peaks (6620 targets) and 4070 MLL1 peaks (3193 targets) in REH cells (Guenther et al., 2008) there are 1799 peaks (1800 targets) overlap with one another (Supplemental Table S9). While there was no difference in the pattern of binding for KDM5A and MLL1, KDM5A formed sharp peaks at TSS and MLL1 formed wide peaks (Supplemental Figure 3A). Moreover, although both KDM5A and MLL1 are significantly highly expressed (p-value < 2.2e-16) compared to the average expression of all genes in the microarray (Supplemental Figure 3B), the overlapping targets are even more highly expressed than those of KDM5A or MLL1 alone (p-values 3.4e-7 and 0.00365 respectively). There is no significant difference in expression between KDM5A and MLL1 targets. Next, to understand the functional clustering of the target genes, we distributed the targets of these HMTs and HDMs among the GO and April
2012
 156 A 157 5. R ELATIVE REQUIREMENT OF KDM5A Figure 6 B 158 C average profile of KDM5A around TSS distance from TSS D average profile of H3K4me3 around TSS Proportion of TSS peaks 5. R ELATIVE REQUIREMENT OF distance from TSS 159 KDM5A E G5 Average profile around center of the peaks G10 G11 F Figure 6: KDM5A, histone and TFs profile around peak summit of various grouped KDM5A/H3K4me3 peaks. (A) Average profile of KDM5A, Pol II, p300 and various histone modifications is presented 2Kb up and down-stream from the summit (or center of the H3K4me3 peak) of various group of KDM5A peaks (see Figure 4B for grouping). Peaks with increased or decreased H3K4me3 (KO vs. f/f) were determined by SICER with FDR cut off 0.01. Center (zero) represents the KDM5A peak summit of corresponding group. Y scale represents normalized average tag numbers in each 50bp bin. Each image represents one specific histone signal at various group of peak's peak summit. (B) Average profile of KDM5A, Pol II, p300 and various histone modifications is presented around (2kb upstream and 2Kb downstream from TSS) the center location or peak summit of peaks in the groups G4, G7, G8, and G9. (C) Average profile of KDM5A reads around TSS of each EnsEMBL gene. We calculated the average number of mapped reads for peaks each peak group in single bp bin in a window of 2Kb upstream and 2Kb downstream from TTS. (D) Similarly, H3K4me3 profile is plotted for peaks in each group around TSS (right panel). (Left panel) Pie chart showing proportion of peaks in each peak group overlap with TSS in a 2Kb window range surrounding the TSS. Overlap means at least one bp overlap. (E) Average profile of KDM5A, Pol II, p300 and various histone modificationcs is presented around (2kb upstream and 2Kb downstream from TSS) the center location of peaks in the group G5, G10, G11 (F) Absolute (Log2) expression of targets in group G1, G5, G10, G11 and all genes studied in the microarray. Expressions of genes are in both f/f and KO condition of undifferentiated mouse ES cells. 160 pathway (KEGG) categories and tested for enrichment using Gitools (Perez-LIamas and Lopez-Bigas 2011). Among all, some of the selected representative GO and pathway categories are showed in Figure 7B. We found that LSD1/KDM1A, KDM5A/RBP2, MLL1/KMT2 and EZH2/KMT6 associate with genes involved in transcriptional activity. Although “cell cycle” category was not statistically significantly enriched (FDR 0.05) in all cases, all of them have number of targets related to this function. KDM5A and MLL1 targets display categories of initiation and elongation from RNA POl II; translational initiation and elongation; electron, including NADH to ubiquinone. In contrast, EZH2 binds to genes involved in ion transport (calcium potassium sodium ion transport), pattern specification and several processes involved in multicellular organismal development including nervous and skeletal system development, spermatogenesis, liver, heart, epidermis development (not all categories are shown in Figure 7B). Although some of the functions are overlapping for targets of KDM5A and EZH2, interestingly, both KDM5A and EZH2 show a distinct functional pattern. For example, only KDM5A and MLL targets are enriched for mitochondrial electron transport, NADH to ubiquinone and intracellular protein transmembrane transport. Also, we find that KDM5A, MLL1 and EZH2 targets are enriched in several developmental processes, for example, brain development and in utero embryonic development. However, some developmental related GO categories enriched only among MLL1 or EZH2 targets. For example, inner ear, skeletal system, lung development etc biological processes enriched methylase is in EZH2 targets, on the other hand for cerebral cortex development and fat cell development we see enrichment only among MLL1 targets (Figure 7B left panel). These suggest that, for some processes, more than one HMTs/HDMs are coordinately functioning together; however, there are developmental processes where, they are independently functioning. KDM5A targets showed highly significant enrichment (p value <10-14) for genes involved in ER to Golgi and intra-Golgi vesicle mediated transport. Response to DNA damage stimulus (excision-repair, etc) and response to X-rays is characteristic for KDM5A (121 genes) and EZH2 (102 genes) but not for MLL1 targets. These data suggest that enzymes involved in histone methylation correlate with each other at an individual nucleosome level. When we analyzed enrichment of targets, we found that all of them are involved in several disease related pathways including cancer (Figure 7B right panel). Some of the selected enriched targets in this pathway are represented in Figure 7C. Strikingly, all these regulators were bound to genes involved in systemic lupus erythematosus (SLE) (Figure 7C). CDK6 is one such gene in this pathway and is target of all the analyzed HDMs/HMTs, except for UTX. This kinase is a catalytic subunit of the protein kinase complex that is important for cell cycle G1 phase progression and G1/S transition. This kinase, as well as CDK4, has been shown to phosphorylate, and thus regulate the activity of tumor suppressor protein Rb (Khleif et al. 1996). Expression of this gene is up-regulated in some types of cancer and may play an important role in the development and/or progression of a subset of human prostate cancers by stimulating the activity of the androgen receptor (Lim et al. 2005). HMTs/HDMs binding sites in K562 cells shows marks of H3K4me2/3 and H3K27ac (Figure 7D). We validated recruitment of KDM5A in CDK6 region and H3F3B region. As a representative gene from “pathways in cancer” and a gene from “systemic lupus erythematosus” these genes were analyzed for binding of KDM5A, KDM1A, KMT2A, KDM6A, KMT6 and H3K4me3 by ChIP-qPCR (Figure 7E). Systemic lupus erythematosus (SLE) was overrepresented for MLL1, however, all of the HMTs/HDMs have target gene of SLE pathway (Figure 7C, D). SLE is one of the best-studied autoimmune inflammatory diseases characterized by the production of autoantibodies to multiple nuclear antigens. Some of autoimmune disorders have an epigenetic component since recent studies showed that H3 methylation is targeted by autoantibodies in SLE (van Bavel et al., 2011; Dieker and Muller 2010). For instance, compared with other diseases of the same clinical outcome, such as rheumatoid arthritis, SLE patients feature increased H3K27me3. A recent study showed expression of SLE to be associated with changes in expression of a histone demethylase (Zhang et al., 2011). These studies allow gaining further insights into the biological role of lysine methylation. HMTs/HDMs are recruited to KDM5A and participate in gene regulation Due to highly overlapping functions among various HMTs/HDMs, we speculate that target genes are coordinately regulated by a set of HMTs/HDMs. From overlap analysis of target location’s of various HMTs/HDMs we found that April
2012
 5. R ELATIVE REQUIREMENT OF 161 KDM5A Figure 7 A KDM5A-AI KDM5A-AI KDM5A-LI KDM5A-LI KDM1A KDM6A KDM6A KMT2A KMT2A KMT6 KMT6 KDM1A 35.97 17.85 21.28 12.39 15.33 100 KDM5A-AI 100 KDM5A-LI 64.49 KMT2A 44.20 KDM6A KMT6 1.67 4.55 64.49 100 27.51 0.92 2.19 6.57 44.2 27.51 100 2.53 3.12 9.08 1.67 0.92 2.53 100 1.56 4.20 4.55 2.19 3.12 1.56 100 7.07 13.05 6.57 9.08 4.20 7.07 100 0% KDM5A-AI 100 71.98 100 32.85 15.72 10.89 17.85 100% 56.37 32.85 100 18.43 8.74 21.28 30.95 15.72 18.43 100 10.05 12.39 18.60 10.89 8.74 10.05 100 15.33 KDM5A-LI 71.98 KMT2A 56.37 KDM6A 30.95 KMT6 KDM1A 18.60 35.97 KDM1A 13.05 Peak overlap (at least by1 base) Target gene overlap 162 (B) BIOLOGICAL PROCESS significant 0.0 FDR p-value non-significant 0.05 1.0 PATHWAYS (KEGG) 5. R ELATIVE REQUIREMENT OF 163 KDM5A C Figure 7: Overlap of peaks and target genes of various HMTs/HDMs and functional overrepresentation: (A) Binding location of KDM1A/LSD1, KDM6A/UTX, KMT2A/MLL1, KMT6/EZH2 were obtained from previously published genome-wide (ChIP-on-Chip or ChIP-seq) studies (see Supplemental Table S1) and annotated to the closest target genes. Figure shows overlap (% of lowest occurrence group) of genomic location as well as common target genes. Genomic location overlap is defined as at least one base overlap. (B) Gene Ontology (GO) and pathway (KEGG database) enrichment of targets of HDMs and HMTs. Over-representation of GOBP categories of two HDMs and two HMTs from publicly available data as well as our U937 cells KDM5A targets is presented in a color coded heatmap. Heatmap summarizes regions where at least two HMTs/HDMs show overrepresentation. Color towards red indicates more significant while color towards yellow indicates less significant, and gray represent non-significant p-value. White color indicates none of the genes analyzed in the study for that particular CR and for particular GO or pathway categories. (C) The list of genes from the significantly over-represented pathway categories “pathways in cancer” (left) and “systemic lupus erythematosus” (right) shown in color-coded heatmap. The gene targets by at least two proteins are shown. Blue color represents target and gray represent not a target of that HDM or HMT for that gene and white cells indicates data was not present in the platform utilized in the experiment. (D) UCSC human genome (hg18) browser view of genomic location of HDMs and HMTs together with several other peaks from public data available at the CDK6 gene. Horizontal bars in top tracks for EZH2/KMT6 and LSD/KDM1A define significantly enriched regions (ChIP-on-chip data) while peaks for KDM5A and MLL1/KMT2A are generated from filtered aligned reads (ChIP-seq data). (E) Quantitative analysis of KDM5A enrichment in CDK6 region and H3F3B region. A representative gene from “pathways in cancer” and a gene from “systemic lupus erythematosus” were analyzed for binding of KDM5A, KDM1A, KMT2A, KDM6A, KMT6 and H3K4me3 by ChIP-qPCR. Occupancy of the control CREG1 region unbound by KDM5A, is shown as a control. Error bars: means + SEM, n=2. 164 there are overlapping binding sites at the same gene for several HMTs/HDMs (Figure 7). Here, first we confirmed that KDM5A targets KDM5B and CDK6 undergo expression changes in the absence of KDM5A (Figure 8A). Not only at CDK6, but also in many other KDM5A targets including BET1, HMG20A and HOXA9 we observed binding sites of other HMTs and HDMs (Figure 8 B, C, D, E). These sets are also marked by high level of H3K4me2/3 but low level of H3K4me1 or H3K27me3. Detection of high level of H3K4me3 as well as recruitment of HMTs/HDMs at these sites were also validated by quantitative qPCR analysis (Figure 8F, G). Opposing H3K4-modifying enzymes cooperatively function for the same process We have seen a high degree of overlap between MLL1 and KDM5A (Figure 7A). To understand the cooperation between lysine methylation by MLL1 and demethylation by KDM5A, we aimed to get further detail on the function played by overlapping KDM5A and MLL1 targets. Therefore, we have determined which gene ontology terms were significantly overrepresented in the set of genes associated with both MLL1 and KDM5A and genes that belong to only these two groups (Figure 9A). Our analysis showed that in most cases the target genes within particular GO groups overlap significantly indicating their requirement for methylation. For many GO categories (ie. transcription translation, replication, RNA splicing, etc.) for the overlapping set of targets as well as for KDM5A or MLL1 targets alone. Moreover, interestingly, there are some particular GO categories (ie. mitochondrial electron transport, intra-golgi vesicle mediated transport, chromatin assembly, mitotic spindle organization, androgen receptor signaling pathway etc) that are significantly enriched only for overlapping group of target genes but not in nonoverlapping genes sets. We observed some functions enriched specifically for either KDM5A or MLL1 targets; however, this may be either context dependent of tissue-specific function overrepresented. We noticed similar phenomena when we determined enrichment of pathways (Figure 9B). Interestingly, we detected some pathways (ie. ribosome’s, DNA replication, nuclear excision repair, mismatch repair and base excision repair) in which the overlapping gene set is significantly enriched but it is not enriched when we consider only MLL1 taget genes, indicating cooperation for the same set of function. DISCUSSION The changes in global chromatin landscape are responsible for cellular response to activation of a specific transcription program, for the maintenance of genome integrity and cellular identity. While the correlations in the global distributions of different histone modifications have been determined, whether any combinatorial pattern exists in location of relative histone modifying enzymes is unknown. Here we report the genome-wide data of a demethylase associated with the mark of active transcription, histone H3K4. We compared its location with other enzymes involved in methylation at H3K4 as well as with the enzymes involved in methylation at the opposing histone mark, H3K27. We show that H3K4 and H3K27 methylases and demethylases are associated with regions of promoters and transcription start sites. Moreover, their target genes are contributing to relevant processes that are promoted or repressed during cancer development. Therefore, this analysis identifies the epigenetic landscape established in cancer as a result of binding of multiple factors, thus suggesting potential targeting strategies in therapy. Interestingly, our approach of grouping KDM5A/H3K4m3 peaks based on presence of one another in both Kdm5a f/f and KO conditions identified four groups of peaks that were unexpected. These are peaks in groups G4, G5, G10 and G11. Even with the identification of large number of H3K4me3 enriched islands (33824) compared to one previous study (19511 HMM regions, 26760 window-interval by Mikkelsen et al. 2007) in our study, considerable number KDM5A peaks (1224) in G4 did not find overlap (at least one base) with H3K4m3. KDM5A in these sites may be either regulating H3K4me3 locating apart or may have demethylase independent activity (DiTacchio et al., 2011). This indicates functional diversity of KDM5A in different locations of the genome. Another three groups consist of H3K4me3 peaks, but not overlap with KDM5A. G11 peaks can only be detected in absence of functional Kdm5a raising the possibility of indirect control of other HMTs/HDMs by KDM5A. This could be KDM5A homologous KDM5B, JARID2, LSD1/KMT1A, etc. Both KDM5B and JARID2 can interact with KDM5A. In KDM5B gene and we have observed binding sites of KDM5A and MLL1 near its TSS. However, UTX is rather located towards the transcription termination site (TTS), and LSD1 and EZH2 has binding sites in near promoter (Figure 8C). In K562 cells, these regions are April
2012
 5. R ELATIVE REQUIREMENT OF 165 KDM5A Figure 8. KDM5A is recruited to its genomic loci along with other KDMs and KMTs, yet, still critical for gene regulation. (A) The level of expression of the KDM5A targets KDM5B and CKD6 were analyzed in wild-type MEFs and MEFs with a deletion in the Kdm5a by immunoblotting. The Rb1-/- cells were analyzed to check if the effects on protein expression is due to cell cycle changes. The CDK6 antibody gives a weak background band (indicated by arrow), which is unchanged in Kdm5a-/- cells. -tubulin was used as a loading control. (B, C, D, E) UCSC human genome (hg18) browser view of genomic location of HDMs and HMTs together with several other peaks from public data available at the CDK6 gene. (F) The level of H3K4me3 at the studied loci, as determined by ChIP assays with anti-H3K4me3 antibodies in differentiated U937 cells. (G) ChIP assays, using KDM5A, KDM1A, KMT2A, KDM6A, KMT6 antibodies, of genes shown above in U937 differentiated cells. The scales and legend is as in Figure 7E. Error bars: means + SEM, n=2. 166 A only KDM5A overlap KDM5A, KMT2A only KMT2A significant 0.0 non-significant 0.01 1.0 Biological Process only KDM5A overlap KDM5A, KMT2A only KMT2A FDR p-value B Pathways (KEGG) 5. R ELATIVE REQUIREMENT OF Figure 9: Over-represented gene ontology categories and pathways for KDM5A and MLL1-specific targets and overlapping targets. After overlap analysis between targets of KDM5A and MLL1/KMT2A, we found three groups of genes: genes that are specific for KDM5A, genes that are specific to MLL1 and genes that are common to both groups. We used genes in these three groups for functional and pathway overrepresentation analysis. Enrichment of GO categories (A) and pathways (KEGG database) (B) for U937 KDM5A AI peaks that overlap with MLL1 and not overlap. Target genes from these groups were subjected to enrichment analysis using Gitools and corrected p-values (FDR 0.01) are presented in color-coded heatmap. Color scale represents corrected p-value. Color towards red indicates more significant while color towards yellow indicates less significant, and gray represent insignificant p-value. 167 KDM5A highly enriched for H3K4me2/3 and H3K27ac. In KO condition, KDM5B or LSD1 undergo expressional changes. Recently Xie et al. (2011) reported rather unfamiliar function of KDM5B. According to their study, H3K4me3 demethylase KDM5B can bind far intergenic regions, in contrast to generally known close promoter binding, and repress cryptic intergenic transcription in mouse ES cells. In contrast to G5, H3K4me3 peaks in G10 and G11 bind rather far away from TSS and showed enhancer markers, which demand further validation. Enhancer is an important regulatory element that can significantly accelerate the transcription of a gene. Previously identification of enhancer was single gene based study. With the advent of genome-wide study, combinatorial chromatin signature pattern has been implicated to identify putative enhancer marker. Although where and how the epigenetic changes that defines enhancers and promoters are coordinated remain to be characterized, at least three recent studies have shed lit on it (Creyghton et al., 2010; RadaIglesias et al., 2011; Pekowska et al., 2007). Accordingly, presence of enhancer elements could be detected by the enrichment of H3K4me1 and P300. Creyghton and colleagues defined enhancer based on enrichment of H3K4me1 and H3K27ac in mouse ESCs. Presence of H3K27ac distinguishes active and poised enhancers. Their study showed that lack of H3K27ac may correspond to a primed state somewhat similar to the transcriptionally poised or bivalent chromatin (H3K4me3/H3K27me3) domains of the promoters for many developmental regulator genes in ESCs (Cantone et al., 2011; Azuara et al., 2006). According to the work of Rada-Iglesias et al. (2011) on human ESC, presence of H3K4me1 and/or P300 and absence or low level of H3K4me3 would mark the putative enhancer elements. However, based on the presence of two mutual exclusive markers: H3K27ac and H3K27me3 they classify these enhancers into two different classes. The first ‘class’ consists of regions with H3K27ac mark, and contains enhancers of previously reported ESC-specific genes that include OCT4 and NANOG. In another class, which was marked by presence of H3K27me3, genes are rather less expressed and were implicated in early developmental stages. However, on the contrary to Rada-Iglesias, Pekowska et al. (2011) claimed that H3K4me3 would rather provide epigenetic signature of active enhancer. According to their study, presence of alone H3K4me1 would not tell about stagespecific enhancer activity due to the fact that H3K4me1 is also implicated for inactive developmental genes. Rather, enhancer regions may be sketched by the presence of a characteristic signature of all H3K4 methylation types (H3K4me1/2/3); and H3K4me3 would generally mark active enhancers. They found not only H3K4me1, but also H3K4me2 and H3K4me3 histone modification as ‘authentic and functional’ enhancer in developing thymocytes. In our study, the groups: G5, G10 and G11 showed low level of H3K4me3 and high activation mark H3K27ac (Figure 6E). Suppressive H3K27me3 marks are absent from all these three groups. However, G5 lacks important marker of H3K4me1. Presence both H3K4me3 and high H3K4me1 in G10, and G11 may contradict the enhancer definition that came out from the work of Rada-Iglesias et al. (2011), but it can be explained by the recent work of Pekowska et al. (2011). Both G10 and G11 also contain the low level P300 marks. Therefore, at least some of these regions could serve as enhancer elements. However, in contrast to activation mark H3K27ac, lacking in significantly high-level marks for transcription activation machinery Pol-II was observed. Also when compared to the expression, only G5 is significantly highly expressed compare to ‘average’ expression of all genes in the microarray (Figure 6F). G10 and G11 average expression is similar to average expression of all other genes studied. Enrichment analysis showed that these groups consist of genes having function of cell proliferation as well as differentiation/development. Probably in proliferating mouse ES cells developmental genes remain less expressed but open for activation signal by other transcription factors, while proliferating genes are more active. However, Peaks in all these three groups enriched for several developmental processes (Figure 5). The ARID domain of H3K4 demethylase KDM5A binds to a DNA CCGCCC motif (Tu et al, 2008), however, de novo motif analysis in KDM5A targets in mouse ES cells showed enrichment of GC/GA rich motif (Peng et al. 2009). A large number of the KDM5A targets in mouse ES cell (Peng et al. 2009) overlap with CpG islands, thus challenging the importance of CCGCCC motif that is common in CpG islands. This is consistent with the previous genome-wide analysis that H3K4me3 is enriched at CpG islands (Thomson et al., 2010). Almost two thirds of promoters in the mammalian genome are enriched for CpG dinucleotides called CpG islands (Illingworth and Bird, 2009). In our ChIP-seq analysis in human U937 cell line, except large April
2012
 168 isoform specific portion of KDM5A peaks, rest overlap significantly with CpG island. Although we identified CG rich motif by de novo analysis, it apparently shows the property of CpG island as non-CpG overlapping peaks could not enriched for CG rich motif (Supplemental Table S5 and S7). Previous studies by Peng et al. (2009) found that only 55% of all KDM5A peaks has preference for Ets-like motif with GGAA core, indicating probability of finding other nucleotide motifs in KDM5A binding sites. Our large isoform-specific peaks (only-LI) showed a preference of motif with core ATGTG nucleotide (Figure 3D). While most CpG dinucleotides are methylated on cytosine residue, CpG islands are typically unmethylated. In addition, some genes, such as tumor suppressor genes, can become methylated during oncogenesis. In contrast, gene silencing during development is not associated with the CpG methylation, including CpG islands (Baylin and Bestor, 2002; Illingworth and Bird, 2009). Previous studies indicated that CpG islands are associated with H3 hyperacetylation (Roh et al. 2005). This fits with our observation that most of our KDM5A peaks are in CpG islands and are marked by high level of activation mark H3K27ac (Figure 6A). However, both the expression and H3K27ac mark is relatively low for KDM5A where it did not find overlap with H3K4me3 (group G4, in Figure 4C and Figure 6A). This observation also includes H3K4me3 as a potential activation mark for CpG regions. However, both the activating H3K4me3 and suppressive H3K27me3 marks have been found also enriched in the CpG islands. Many repressed promoters displaying H3K27me3 mark were associated with high levels of both H3 acetylation and H3K4me3 (Roh at el. 2006). Strikingly, many of these promoters are from rapidly inducible genes including NFATC2, AKT2, and RHEB, suggesting that these regions have open chromatin configuration and in conjunction with the bivalent marks and H3 hyperacetylation, remain poised for rapid activation. Maintenance of this balance of TrxG and PcG expression is required, as misregulation of either enzyme would result in an incorrect readout of histone modifications. Similar to KDM5A, binding of another, TxG protein ASH1 that does have a SET domain, is limited to transcriptionally active (fully derepressed or balanced) PcG targets. In tissues with impaired function of a single HDM or HMT, the maintenance of expression level of corresponding targets genes may require a switch in HDM/HMT gene expression signature. Each MLL-related H3K4 HMTs is essential, since its knockout in mice results in embryonic lethality. Also, the developmental defects observed in homozygous mutant embryos are unique for each gene (Yu et al., 1998). Contrasting the absence of redundancy among H3K4 HMTs, KDM5A knockout animals deficient in H3K4 histone demethylase are viable and do not exhibit gross defects (Klose et al. 2007). These observations led to the question of functional redundancy of KDM5A. However, finding global increase of methylation level in depletion of Kdm5a (Figure 4A) supports the nonredundant function of KDM5A even with the fact that some of the KDM5A binding sites can be bound by homologus PLU1/KDM5B (Islam et al., 2011). Also, consistent with notion that H3K4me3 correlates with higher transcriptional activity, we found a higher expression of KDM5A targets with differentially increased H3K4me3 (group G7) (Figure 4B and Figure 4C). However, we found that a small fraction (4672 peaks or 13.81%) of our H3K4me3 peaks (f/f) are overlapping with H3K27me3 regions identified in mouse ES cells in the study of Mikkelsen et al. (2007). Despite this repressive enzymatic activities associated with KDM5A, our ChIP-seq analysis reveals that KDM5A binds to promoters marked with H3K4me3, especially around the TSS of highly expressed genes (Figure 1, and Supplemental Figure S2B). These data may be explained by invoking a mechanism of cyclical recruitment of ‘‘writers’’ and ‘‘erasers’’ to sites of active transcription (Wang et al., 2009). All states of H3K4 methylation are elevated at the TSSs, while a significant dip in the signal between -200 to +50 correlated with the nucleosome loss in active genes (Figure 6D). Consistent with previous observation in human T cells (Roh et al. 2006) the peak of H3K4me3 correlates with peaks of K27 acetylation and occurs immediately downstream of the TSS. In that study, the H3K4me3 signal was detected on two-thirds promoters (9,910 promoters) in the human genome, and H3K27me3 was detected on one-third promoters (5,252 promoters). The bivalent promoters in T cells, which are resting cells, constitute 3,330 promoters. The highest number of H3K27me3 tags correlated with the lowest the expression levels. This suggested that the placing of H3K27me3 mark allows to decreased expression level. Observed co-regulation between opposing methylation mark H3K4me3 and H3K27me3 for final transcriptional outcome suggest a coordinated regulation among the enzymes involved in coding these marks. It is probable that two more HMTs/HDMs function at the same time or 5. R ELATIVE REQUIREMENT OF April
2012
 169 KDM5A sequentially for the same cellular process. Previous attempts mapped 29 chromatin-binding proteins in two cell types (Ram et al., 2011). In this study we have observed a fairly high overlap in targets of HMTs and HDMs in some instances (Figure 7A), especially for enzymes encoding two opposing marks, namely KDM5A and MLL1. Almost all of them are significantly enriched for transcriptional activity and for cancer and other disease related pathways (Figure 7B). The H3K27specific HMT, EZH2/KMT6, contains a SET domain that involves in di- and tri-methylation of H3K27. Two studies provided mechanisms for EZH2 acting as a transcriptional activator rather than a repressor (Shi et al., 2007; Tonini et al., 2004). While EZH2 is known to interact mainly with HDAC1, it is also capable of forming complexes with pRb2/p130, thus interfering with pRb2/p130-mediated repression (Tonini et al., 2004). Although targets of the enzymes KDM5A, MLL1, and EZH2 enriched for many similar GO categories, a distinct functional cluster was observed for EZH2, which includes an enrichment of several developmental processes besides disease related pathways. Recurrent mutations in the SET domain of EZH2 were recently shown to be associated with a specific lymphoma subtype (Morin et al., 2010). EZH2 is known to promote anchorage-independent cell growth, migration and invasion by down-regulating its target genes. In prostate cancer, the key gene regulated by EZH2 is ADRB2, a G-protein coupled receptor (Yu et al., 2007). Importantly, there is a strong link between H3K27 occupied genes in prostate metastases and H3K27me3-, SUZ12-, or EED-occupied genes in embryonic stem cells, which indicates that the function of PRC2 to control stem cell pluripotency and differentiation is essential for prostate cancer progression (Yu et al., 2007). Even with observed functional diversity, since KDM5A and MLL1 overlap significantly and several functions and pathways enriched that are highly dependent on shared gene set (Figure 9). One such example is Hoxa9 gene. MLL1 is an upstream regulator of this gene and with MLL fusion protein expression of this gene is increased in aggressive leukemia’s. This process requires PHD domain (Milne et al. 2010). Fusion between NUP98 and PHD domain of KDM5A was also reported in leukemiagenesis (Wang et al 2009). We observed that KDM5A and EZH2 have overlapping binding sites with MLL1, targeting this gene (Figure 8F). This location also showed increased H3K4me3 mark. This suggests that not only different locations of KDM5A regulate the different biological outcome, but also several others functionally related or even opposing enzymes may contribute to these functions. This compendium equally holds for other HMTs and HDMs. METHODS Cell culture and immunoblotting Human U937 diffuse histiocytic lymphoma cells were seeded at the density 2.5x105 cell/ml and grown in RPMI medium (Mediatech) containing 50 nM TPA (Sigma) for 96 hours at 370C. The medium was changed on fresh medium 72 hours after the treatment has started. The cells successfully differentiated to macrophages/monocytes (Benevolenskaya et al. 2005). Wild-type (Kdm5a f/f) and Kdm5a-/(KDM5A KO) ES cells (Lin et al, 2011) were maintained on irradiated mouse embryonic fibroblast (IEF) feeders in standard ES medium (DMEM; Dulbecco’s modified Eagle’s medium) supplemented with 15% heat-inactivated fetal calf serum, 0.1 mM 2-mercaptoethanol, 2 mM Lglutamine, 0.1 mM non-essential amino acid, 1% of nucleoside mix (100X stock, Sigma), 1000U/ml recombinant leukemia inhibitory factor (LIF; Chemicon) and antibiotics. For ChIP assays, the cells were seeded on gelatin-coated plates. Immunoblotting was performed using a mixture of rabbit KDM5A anti-serum 2469 and the affinity purified antibody 1416, our KDM5B anti-serum 4440, mouse RB G3-245 (BD Biosciences) and α-tubulin (T9026) (Sigma) antibodies. ChIP, ChIP-seq and library preparation for sequencing: ChIP experiments were performed as described (Lopez-Bigas et al., 2008) using the following rabbit antibodies: our KDM5A antibodies 1416 and 2469, and H3K4me3 (ab-8580) from Abcam. Real-time PCR was performed using the SYBR Green PCR master mix and the CFX96 system (Bio-Rad). ChIP-seq in U937 cells was as described in Beshiri et al. (2010) and Lopez-Bigas et al. (2008). Procedure for mouse ES cells KDM5A ChIP-seq described in our submitted manuscript. Here we perform H3K4me3 ChIP-seq in mouse (C57BL/6) ES cells in presence or absence of KDM5A. Two separate Kdm5af/f clones and two Kdm5a-/- clones were analyzed. Total genomic DNA sample was prepared from DNA of the two April
2012
 170 Kdm5af/f clones mixed in a 1:1 ratio. Reference DNA was either the total genomic DNA sample or the Kdm5a-/- clone ChIP-seq sample. After adapter ligation DNA was PCR amplified with Illumina primers for 18 cycles and library fragments of ~320 bp (insert plus adaptor and PCR primer sequences) were band isolated from an agarose gel. The purified DNA was captured on an Illumina flow cell for cluster generation. Libraries were sequenced on the Genome Analyzer following the manufacturer's protocols. Overall experimental design and data analysis, together with the use of public data described in Supplemental Figure S1 and Supplemental Table S1. Sequencing, mapping and peak detection: Illumina/Solexa sequencer output images were processed using Solexa image extraction pipeline [version 1.6 (Cassava)]. Identified 36bp short reads were uniquely aligned allowing at best two mismatches to the UCSC (The Genome Sequencing Consortium) reference genome (hg18 for human and mm9 for mouse) using the program BOWTIE (Langmead et al., 2009). Sequence matched exactly more than one place with equally quality were discarded to avoid bias. Peak caller algorithm MACS (version 1.3.7.1) (Zhang et al., 2009) was used to determine enriched peak region against total genomic input as background for KDM5A ChIP. Peaks were modeled using default parameter with 'futurefdr' flags on. Final 'mfold' were 20, 10 and 10 for KDM5A in U937 cell all isoform, large isoform and mouse ES cells respectively. However, for histone H3K4me3 in mouse ES cells, popular peak caller program for histone, SICER (version 1.1) (Zang et al., 2009) was used with following parameter: redundancy threshold=1, window size=200, fragment size=150, effective genome fraction=0.75, gap size=200, and FDR=0.01. Differential binding of H3K4me3 in KO of KDM5A vs. WT (F/F) was also done using SICER program script “SICER-df.sh” with similar parameter settings. ChIPseq data were first analyzed individually to check reproducibility, however, finally they were merged to get more sequencing depth. Binding sites to gene feature annotation: Enriched peaks were annotated to the nearest EnsEMBL (Hubbard et al., 2007) gene (for human version 54, and for mouse version 55) using Bioconductor package ChIPpeakAnno (Zhu et al., 2010). Besides our ChIPseq data, previously published public ChIPseq/chip-on-chip data used in this study were also annotated to nearest gene, from author identified enriched peak locations, using same procedure. Distribution of binding sites to various genomic features, and continuous ChIP signal profile near transcription start site (TSS) and distribution of reads along the meta-gene were performed using software CEAS (Shin et al., 2009) and in-house Python and Perl scripts. Publicly available genome-wide location data analysis: 5. R ELATIVE REQUIREMENT OF Functional and pathway enrichment analysis: Functional annotation of target genes is based on Gene Ontology (GO) (Consortium, 2000; http://www.geneontology.org) as extracted from EnsEMBL (Hubbard et al., 2007) and KEGG pathway database (Kanehisa et al., 2000). Accordingly, all genes are classified into ontology: genes involved in Biological Process (BP). We have taken only the GO/pathway categories that have at least 10 genes annotated. We used GiTools for enrichment analysis and heatmap generation (Perez-Llamas and Lopez-Bigas, 2011). Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method of False Discovery Rate (FDR) (Benjamini and Hochbert, 1995). Expression data analysis: Publicly available expression microarray data in mouse ES cell line used in present study (Supplemental Table S1) were analyzed from scratch. Raw data were background corrected and normalized using the Bioconductor package “affy” (Gautier et al., 2004; Irizarry et al., 2003) with 'RMA' method; and differential expression was done using Limma (Smyth et al., 2004). Peak classification and expression: April
2012
 171 KDM5A For public ChIP-on-chip or ChIP-seq data, we used author’s enriched peaks and then annotated to nearest gene using the program ChIPpeakAnno (Zhu et al., 2010). Publicly available wholegenome data on chromatin modifications H3K4me1, H3K27me3, and H3K36me3 were first re-mapped to the latest mouse genome (mm9, NCBI/NIH), or human genome version hg18 using the UCSC coordinate conversion tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). We used ChIPpeakAnno (Zhu et al., 2010) to assign a peak to its closest EnsEMBL gene and we also calculated distance (d) from peak's summit point to TSS. We classified peaks based on this distance of a peak. If a peak's 'd' is within -200 to +1100 we call it “TSS” peak; if the 'd' is >1101 but ≤ gene end +100bp then we call it “gene body” peak; if a peak is located even more than 100bp downstream after the end of a gene, then we call it “downstream” peak; if the peak is within a range of -201bp upstream of TSS to -3100bp, then we call it “promoter” peak; and if a peak is located even more upstream then it is classified as “distal” peak. We have also grouped mouse ES cells KDM5A peaks based on presence or absence of H3K4me3, or differentially increased or decreased H3K4me in KDM5A knock out (KO) cells compare to wild type. We take the gene expression value of assigned target gene of peaks in each class/group and presented it's expression by box-plot using R programming language. Statistical analysis of significance of differences on distribution of expression values was calculated based on Kolmogorov-Smirnov (KS) test or Wilcoxon rank sum test. Conservation score: We extracted 50bp upstream and 50bp downstream sequences of each peak's summit point. Average conservation score of this region was calculated based on UCSC PhastCons conservation score data (Siepel et al., 2005) on placental mammals (hg18, phastCons44way for human; and mm9, phastCons30way for mouse). Calculation was done using script in ChIPseeqer program (Giannopoulou and Elemento, 2011). For plotting of conservation score, score was calculated for 500bp upstream and downstream regions from peak summit. We compared conservation score of random peaks within 5Kb region of each KDM5A peak and plotted average conservation score of 10bp window using R programming language De novo motif finding: Top 1000 KDM5A ChIP-seq peaks were selected (ranking based on p-value) for de novo motif finding. Total 100bp sequences surrounding peak summit of these 1000 peaks were used as input for Weeder algorithm (Pavesi et al., 2001) to find possible motif. The top ranked motif was considered and logo for this motif was generated using the program WebLogo tool (Crooks et al., 2004). Overlap analysis: Overlap of genomic position range data was done using BedTools (Quinlan et al., 2010) and significance of overlap was analyzed by Hypergeometric test. However, venn-diagram generation and overlap of target genes analysis was performed using in-house R programming language script. Significance of overlap calculated based on Chi-square test. Average ChIP signal profile Normalized average ChIP signal around center (or peak summit) of each given genomic location or summit point of KDM5A enriched peaks were calculated by average tags number in each 50bp bins in a set window. Relative distance of each tag from above mentioned position and average signal was determined by using 'Sitepro' script of CEAS package (Shin et al., 2009) and plotting was done in R programming language. Tag Density heatmap: Transcription start site (TSS) anchored tag density heatmap matrix was generated by calculating number of tags in each 25bp bin of the -3Kb to +3Kb window relative to TSS of each RefSeq transcripts and normalized by the total number of tags in each ChIPseq. For the calculation we used scripts in HOMER package (Heinz et al., 2010). Cluster analysis of the obtained matrix was done using Cluster 3.0 program (de Hoon et al., 2004) and clustered data were visualized with Java Treeview (Saldanha et al., 2004) Putative transcription factor motif co-occupancy: Possible occurrence of transcription factor (TF) motif in KDM5A peak region (1000bp around peak summit) were predicted with STORM algorithm (Schones et al., 2007) with a p-value cutoff 0.00001 and using position frequency matrices (PFM) from Transfac database (professional version release 2009.4) (Matys et al., 2003). We calculated random occurrence of TF by scanning 1000 random region of equal size for each enriched peak in KDM5A ChIPseq data, within maxim of 10Kb window around each input KDM5A peak region. However, we excluded having any random peak in UCSC 'gap' region. Then we calculated Z-score from TF motif found April
2012
 172 in ChIP region and average number of motif found in 1000 random region and its standard deviation. We calculated two-tail p-value for each Z-score and finally adjusted (FDR) (Benjamini and Hochbert, 1995) p-vale was taken in consideration for statistical significance. A heatmap of positive Zscore having significant p-value was generated using GiTools (Perez-Llamas and Lopez-Bigas, 2011). Accession Numbers: ChIPseq data are available from http://www.ncbi.nlm.nih.gov/geo/ under accession numbers GSE28384 (mouse H3K4me3), GSE28343 (mouse KDM5A), GSE28323 (KDM5A isoform specific ChIP-seq) 5. 6. 7. 8. ACKNOWLEDGMENTS This project was funded by the R01CA138631 (PI: E.B.) and grants # 05-40 and 07-13 from the American Cancer Society, Illinois Division, Grant # 115347-RSG-08-271-01-GMC from the American Cancer Society (E.B.). N. L-B acknowledges funding from the Spanish Ministry of Science and Technology (grant number SAF2009-06954) and the Spanish National Institute of Bioinformatics (INB). ABMMKI acknowledge fellowship from AGAUR, Catalonian Government, Spain. 9. 10. 11. REFERENCE: 1. Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M. & Sherlock, G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000, 25, 25-29 2. Azuara V., P. Perry, S. Sauer, M. Spivakov, H.F. Jørgensen, R.M. John, M. Gouti, M. Casanova, G. Warnes, M. Merkenschlager, A.G. Fisher. Chromatin signatures of pluripotent cell lines. Nat. Cell Biol., 8 (2006), pp. 532–538. 3. Barski A, Suresh Cuddapah, Kairong Cui, TaeYoung Roh, Dustin E. Schones, Zhibin Wang, Gang Wei, Iouri Chepelev, and Keji Zhao. Highresolution profiling of histone methylations in the human genome. Cell, 129(4):823–837, May 2007. 4. Barski, A.; Cuddapah, S.; Cui, K.; Roh, T.-Y.; Schones, D. E.; Wang, Z.; Wei, G.; Chepelev, I. & Zhao, K. High-resolution profiling of histone 12. 13. 14. 15. 16. methylations in the human genome. Cell, 2007, 129, 823-837. Baylin, S. & Bestor, T. H. Altered methylation patterns in cancer cell genomes: cause or consequence? Cancer Cell, 2002, 1, 299-305. Beshiri, M. L.; Islam, A.; DeWaal, D. C.; Richter, W. F.; Love, J.; Lopez-Bigas, N. & Benevolenskaya, E. V. Genome-wide analysis using ChIP to identify isoform-specific gene targets. J Vis Exp, 2010 Beshiri ML, Holmes KB, Richter WF, Hess S, Islam ABMMKI, Yan Q, Plante L, Gévry N, Lopez-Bigas N, Kaelin WG Jr., and Benevolenskaya EV. Demethylation by KDM5A Contributes to Repression of Cell Cycle Genes During Differentiation. Submitted manuscript, April 2012. Benjamini, Y., and Hochbert, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 1995; 57, 289–300. Bernstein, B. E.; Mikkelsen, T. S.; Xie, X.; Kamal, M.; Huebert, D. J.; Cuff, J.; Fry, B.; Meissner, A.; Wernig, M.; Plath, K.; Jaenisch, R.; Wagschal, A.; Feil, R.; Schreiber, S. L. & Lander, E. S. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell, 2006, 125, 315-326 Bernstein, B. E.; Mikkelsen, T. S.; Xie, X.; Kamal, M.; Huebert, D. J.; Cuff, J.; Fry, B.; Meissner, A.; Wernig, M.; Plath, K.; Jaenisch, R.; Wagschal, A.; Feil, R.; Schreiber, S. L. & Lander, E. S. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell, 2006, 125, 315-326. Boisvert FM, van Koningsbruggen, S.; Navascués, J. & Lamond, A. I. The multifunctional nucleolus. Nat Rev Mol Cell Biol, 2007, 8, 574-585 Cantone, I. & Fisher, A. G. Unraveling epigenetic landscapes: the enigma of enhancers. Cell Stem Cell, 2011, 8, 128-129. Chan SW and Hong W. Retinoblastoma-binding protein 2 (rbp2) potentiates nuclear hormone receptor-mediated transcription. J Biol Chem, 276(30):28402–28412, Jul 2001. Perez-Llamas C, and Lopez-Bigas N. Gitools: analysis and visualization of genomic data using interactive heat-maps. PLoS One, 6(5): e19541 (2011). www.gitools.org Collado, M.; Blasco, M. A. & Serrano, M. Cellular senescence in cancer and aging. Cell, 2007, 130, 223-233 Creyghton, M. P.; Cheng, A. W.; Welstead, G. G.; Kooistra, T.; Carey, B. W.; Steine, E. J.; Hanna, J.; Lodato, M. A.; Frampton, G. M.; Sharp, P. A.; Boyer, L. A.; Young, R. A. & Jaenisch, R. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci., 2010, 107, 21931-21936. 5. R ELATIVE REQUIREMENT OF April
2012
 173 KDM5A 17. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Research, 14:1188-1190, (2004). 18. de Hoon MJL, Imoto S, Nolan J, and Miyano J. Open Source Clustering Software. Bioinformatics, 20 (9): 1453--1454 (2004). 19. Dieker J and Muller S. Epigenetic Histone Code and Autoimmunity. Clinical reviews in allergy and immunology, Volume 39, Number 1 (2010), 78-84. 20. DiTacchio L, Hiep D. Le, Christopher Vollmers, Megumi Hatori, Michael Witcher, Julie Secombe, and Satchidananda Panda. Histone lysine demethylase jarid1a activates clock-bmal1 and influences the circadian clock. Science, 333(6051):1881–1885, Sep 2011. 21. Elizaveta V. Benevolenskaya, Heather L. Murray, Philip Branton, Richard A. Young, and William G Kaelin, Jr. Binding of prb to the phd protein rbp2 promotes cellular differentiation. Mol Cell, 18(6):623–635, Jun 2005. 22. Elizaveta V. Benevolenskaya, Heather L. Murray, Philip Branton, Richard A. Young, and William G Kaelin, Jr. Binding of prb to the phd protein rbp2 promotes cellular differentiation. Mol Cell, 18(6):623–635, Jun 2005. 23. EMBO J, 2011, 30, 4198-4210 24. Foster, S. L. and Medzhitov, R. Gene-specific control of the TLR-induced inflammatory response. Clin Immunol, 2009, 130, 7-15 25. Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics (Oxford, England) 20: 307-315 26. Ge Z, Wenjuan Li, Na Wang, Cheng Liu, Qingjun Zhu, Magnus Bjorkholm, Astrid Gruber, and Dawei Xu. Chromatin remodeling: recruitment of histone demethylase rbp2 by mad1 for transcriptional repression of a myc target gene, telomerase reverse transcriptase. FASEB J, 24(2):579–586, Feb 2010. 27. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium (2000) Nature Genet. 25: 25-29. 28. Goldberg, A. D.; Banaszynski, L. A.; Noh, K.-M.; Lewis, P. W.; Elsaesser, S. J.; Stadler, S.; Dewell, S.; Law, M.; Guo, X.; Li, X.; Wen, D.; Chapgier, A.; DeKelver, R. C.; Miller, J. C.; Lee, Y.-L.; Boydston, E. A.; Holmes, M. C.; Gregory, P. D.; Greally, J. M.; Rafii, S.; Yang, C.; Scambler, P. J.; Garrick, D.; Gibbons, R. J.; Higgs, D. R.; Cristea, I. M.; Urnov, F. D.; Zheng, D. & Allis, C. D. Distinct factors control histone variant H3.3 localization at specific genomic regions. Cell, 2010, 140, 678-691. 29. Giannopoulou EG and Elemento O. An integrated ChIP-seq analysis platform with customizable workflows. BMC Bioinformatics 2011, 12:277. 30. Guenther MG, Lawton LN, Rozovskaia T, Frampton GM, Levine SS, Volkert TL, Croce CM, Nakamura T, Canaani E, Young RA. Aberrant chromatin at genes encoding stem cell regulators in human mixed-lineage leukemia. Genes Dev. 2008 Dec, 15;22(24):3403-8. 31. Guenther, M. G.; Jenner, R. G.; Chevalier, B.; Nakamura, T.; Croce, C. M.; Canaani, E. & Young, R. A. Global and Hox-specific roles for the MLL1 methyltransferase. Proc Natl Acad Sci, 2005, 102, 8603-8608 32. Heinz S, Benner C, Spann N, Bertolino E et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 2010 May 28;38(4):576-589. 33. Hubbard T. J. P., B. L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. Cunningham, T. Cutts, T. Down, S. C. Dyer, S. Fitzgerald, J. Fernandez-Banet, S. Graf, S. Haider, M. Hammond, J. Herrero, R. Holland, K. Howe, K. Howe, N. Johnson, A. Kahari, D. Keefe, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, C. Melsopp, K. Megy, P. Meidl, B. Ouverdin, A. Parker, A. Prlic, S. Rice, D. Rios, M. Schuster, I. Sealy, J. Severin, G. Slater, D. Smedley, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, M. Wood, T. Cox, V. Curwen, R. Durbin, X. M. Fernandez-Suarez, P. Flicek, A. Kasprzyk, G. Proctor, S. Searle, J. Smith, A. Ureta-Vidal and E. Birney. Ensembl 2007. Nucleic Acids Res. 2007 January; 35(Database issue): D610–D617. 34. Illingworth, R. S. & Bird, A. P. CpG islands —‘a rough guide’. FEBS Lett, 2009, 583, 1713-1720 35. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic acids research 31: e15 36. Islam, A. B. M. M. K.; Richter, W. F.; LopezBigas, N. & Benevolenskaya, E. V. Selective targeting of histone methylation, 2011, Cell cycle, 10, 413-424 37. Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). 38. Khleif, S. N.; DeGregori, J.; Yee, C. L.; Otterson, G. A.; Kaye, F. J.; Nevins, J. R. & Howley, P. M. Inhibition of cyclin D-CDK4/CDK6 activity is associated with an E2F-mediated induction of cyclin kinase inhibitor activity. Proc Natl Acad Sci, 1996, 93, 4350-4354 39. Kim Y W, G. A. Otterson, R. A. Kratzke, A. B. Coxon, and F. J. Kaye. Differential specificity for binding of retinoblastoma binding protein 2 to rb, p107, and tata-binding protein. Mol Cell Biol, 14(11):7256–7264, Nov 1994. 40. Klose RJ, Yan Q, Tothova Z, Yamane K, Erdjument- Bromage H, Tempst P, et al. The retinoblastoma binding protein RBP2 is an H3K4 demethylase. Cell 2007; 128:889-900. 41. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009; 10:R25. 42. Lim, J.T., Mansukhani, M., Weinstein, I.B. Cyclindependent kinase 6 associates with the androgen receptor and enhances its transcriptional activity in April
2012
 174 43. 45. 54. 55. 46. 5. R ELATIVE REQUIREMENT OF 44. 47. 48. 49. 50. prostate cancer cells. 2005. Proc. Natl. Acad. Sci. U.S.A. Lin W, Jian Cao, Jiayun Liu, Michael L. Beshiri, Yuko Fujiwara, Joshua Francis, Andrew D. Cherniack, Christoph Geisen, Lauren P. Blair, Mike R. Zou, Xiaohua Shen, Dan Kawamori, Zongzhi Liu, Chiara Grisanzio, Hideo Watanabe, Yoji Andrew Minamishima, Qing Zhang, Rohit N. Kulkarni, Sabina Signoretti, Scott J. Rodig, Roderick T. Bronson, Stuart H. Orkin, David P. Tuck, Elizaveta V. Benevolenskaya, Matthew Meyerson, William G Kaelin, Jr, and Qin Yan. Loss of the retinoblastoma binding protein 2 (rbp2) histone demethylase suppresses tumorigenesis in mice lacking rb1 or men1. Proc Natl Acad Sci U S A, 108(33):13379–13386, Aug 2011. Mao S, G. A. Neale, and R. M. Goorha. T-cell oncogene rhombotin-2 interacts with retinoblastoma-binding protein 2. Oncogene, 14(13):1531–1539, Apr 1997. Margueron R, Li G, Sarma K, Blais A, Zavadil J, Woodcock CL, et al. Ezh1 and Ezh2 maintain repressive chromatin through different mechanisms. Mol Cell 2008; 32:503-18. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003, 31, 374–378. Michael L. Beshiri, Abul Islam, Dannielle C. DeWaal, William F. Richter, Jennifer Love, Nuria Lopez-Bigas, Elizaveta V. Benevolenskaya. Genome-wide Analysis using ChIP to Identify Isoform-specific Gene Targets. 2010. J. Vis. Exp (41), e2101. Mikkelsen, T. S.; Ku, M.; Jaffe, D. B.; Issac, B.; Lieberman, E.; Giannoukos, G.; Alvarez, P.; Brockman, W.; Kim, T.-K.; Koche, R. P.; Lee, W.; Mendenhall, E.; O'Donovan, A.; Presser, A.; Russ, C.; Xie, X.; Meissner, A.; Wernig, M.; Jaenisch, R.; Nusbaum, C.; Lander, E. S. & Bernstein, B. E. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. 2007, 448, 553-560 Milne TA, Kim J, Wang GG, Stadler SC, Basrur V, Whitcomb SJ, Wang Z, Ruthenburg AJ, ElenitobaJohnson KS, Roeder RG, Allis CD. Multiple interactions recruit MLL1 and MLL1 fusion proteins to the HOXA9 locus in leukemogenesis. Mol Cell. 2010 Jun 25;38(6):853-63. Epub 2010 Jun 10. Morin, R. D.; Johnson, N. A.; Severson, T. M.; Mungall, A. J.; An, J.; Goya, R.; Paul, J. E.; Boyle, M.; Woolcock, B. W.; Kuchenbauer, F.; Yap, D.; Humphries, R. K.; Griffith, O. L.; Shah, S.; Zhu, H.; Kimbara, M.; Shashkin, P.; Charlot, J. F.; Tcherpakov, M.; Corbett, R.; Tam, A.; Varhol, R.; Smailus, D.; Moksa, M.; Zhao, Y.; Delaney, A.; Qian, H.; Birol, I.; Schein, J.; Moore, R.; Holt, R.; Horsman, D. E.; Connors, J. M.; Jones, S.; Aparicio, S.; Hirst, M.; Gascoyne, R. D. & Marra, 51. 52. 53. 56. 57. 58. 59. 60. 61. 62. M. A. Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin. Nat Genet, 2010, 42, 181185. Németh, A.; Conesa, A.; Santoyo-Lopez, J.; Medina, I.; Montaner, D.; Péterfia, B.; Solovei, I.; Cremer, T.; Dopazo, J. & Längst, G. Initial genomics of the human nucleolus. PLoS Genet, 2010, 6, e1000889 Ng, H. H.; Robert, F.; Young, R. A. & Struhl, K. Genome-wide location and regulated recruitment of the RSC nucleosome-remodeling complex. Genes Dev, 2002, 16, 806-819 Nuria Lopez-Bigas, Tomasz A. Kisiel, Dannielle C. Dewaal, Katie B. Holmes, Tom L. Volkert, Sumeet Gupta, Jennifer Love, Heather L. Murray, Richard A. Young, and Elizaveta V. Benevolenskaya. Genome-wide analysis of the h3k4 histone demethylase rbp2 reveals a transcriptional program controlling differentiation. Mol Cell, 31(4):520–530, Aug 2008. Ochs, R. L. & Press, R. I. Centromere autoantigens are associated with the nucleolus. Exp Cell Res, 1992, 200, 339-350 Oevelen CV, Jinhua Wang, Patrik Asp, Qin Yan, William G Kaelin, Jr, Yuval Kluger, and Brian David Dynlacht. A role for mammalian sin3 in permanent gene silencing. Mol Cell, 32(3):359– 370, Nov 2008. Pan, G.; Tian, S.; Nie, J.; Yang, C.; Ruotti, V.; Wei, H.; Jonsdottir, G. A.; Stewart, R. & Thomson, J. A. Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells. Cell Stem Cell 2007, 1, 299-312 Pasini, D.; Hansen, K. H.; Christensen, J.; Agger, K.; Cloos, P. A. C. & Helin, K. Coordinated regulation of transcriptional repression by the RBP2 H3K4 demethylase and Polycomb Repressive Complex-2. Genes Dev, 2008, 22, 1345-1355 Pavesi, G., Mauri, G. and Pesole, G. (2001). An algorithm for finding signals of unknown length in DNA sequences, Bioinformatics, Vol.17, Suppl. 1, pp. S207–214. Pekowska, A.; Benoukraf, T.; Zacarias-Cabeza, J.; Belhocine, M.; Koch, F.; Holota, H.; Imbert, J.; Andrau, J.-C.; Ferrier, P. & Spicuglia, S. H3K4 trimethylation provides an epigenetic signature of active enhancers. Peng, J. C.; Valouev, A.; Swigut, T.; Zhang, J.; Zhao, Y.; Sidow, A. & Wysocka, J. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell, 2009, 139, 1290-1302 Quinlan, AR and Hall, IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842. Rada-Iglesias, A.; Bajpai, R.; Swigut, T.; Brugmann, S. A.; Flynn, R. A. & Wysocka, J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature, April
2012
 175 KDM5A 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. Department of Chemical and Systems Biology, 2011, 470, 279-283. Ram, O.; Goren, A.; Amit, I.; Shoresh, N.; Yosef, N.; Ernst, J.; Kellis, M.; Gymrek, M.; Issner, R.; Coyne, M.; Durham, T.; Zhang, X.; Donaghey, J.; Epstein, C. B.; Regev, A. & Bernstein, B. E. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell, 2011, 147, 1628-1639 Rando, O. J. and Chang, H. Y. Genome-wide views of chromatin structure. Annu Rev Biochem, 2009, 78, 245-271. Robert C Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, MarcelDettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Gunther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y. H. Yang, Jianhua Zhang. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 2004; 5:R80 Roh, T.-Y.; Cuddapah, S. & Zhao, K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev, 2005, 19, 542-552 Roh, T.-Y.; Cuddapah, S.; Cui, K. & Zhao, K. The genomic landscape of histone modifications in human T cells. Proc Natl Acad Sci 2006, 103, 15782-15787 Roh, T.-Y.; Wei, G.; Farrell, C. M. & Zhao, K. Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns. Genome Res, 2007, 17, 74-81. Saldanha AJ. Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004 Nov 22;20(17):3246-8. Schones Dustin E , Andrew D Smith and Michael Q Zhang. Statistical significance of cis-regulatory modules. BMC Bioinformatics 2007, 8:19. Schwartz, Y. B.; Kahn, T. G.; Stenberg, P.; Ohno, K.; Bourgon, R. & Pirrotta, V. Alternative epigenetic chromatin states of polycomb target genes. PLoS Genet, 2010, 6, e1000805 Secombe J and Robert N. Eisenman. The function and regulation of the jarid1 family of histone h3 lysine 4 demethylases: the myc connection. Cell Cycle, 6(11):1324–1328, Jun 2007. She, X.; Rohl, C. A.; Castle, J. C.; Kulkarni, A. V.; Johnson, J. M. & Chen, R. Definition, conservation and epigenetics of housekeeping and tissueenriched genes. BMC Genomics, 2009, 10, 269. Shen X, Liu Y, Hsu YJ, Fujiwara Y, Kim J, Mao X, et al. EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 in maintaining stem cell identity and executing pluripotency. Mol Cell 2008; 32:491-502. Shi, B.; Liang, J.; Yang, X.; Wang, Y.; Zhao, Y.; Wu, H.; Sun, L.; Zhang, Y.; Chen, Y.; Li, R.; Zhang, Y.; Hong, M. & Shang, Y. Integration of 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. estrogen and Wnt signaling circuits by the polycomb group protein EZH2 in breast cancer cells. Mol Cell Biol, 2007, 27, 5105-5119. Shi, Y.; Lan, F.; Matson, C.; Mulligan, P.; Whetstine, J. R.; Cole, P. A.; Casero, R. A. & Shi, Y. Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell, 2004, 119, 941-953. Shin H, Liu T, Manrai AK, Liu XS. CEAS: cisregulatory element annotation system Export. Bioinformatics, Vol. 25, No. 19. (1 October 2009), pp. 2605-2606. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3. The Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. Thomson, J. P.; Skene, P. J.; Selfridge, J.; Clouaire, T.; Guy, J.; Webb, S.; Kerr, A. R. W.; Deaton, A.; Andrews, R.; James, K. D.; Turner, D. J.; Illingworth, R. & Bird, A. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature, 2010, 464, 1082-1086. Tonini, T.; Bagella, L.; D'Andrilli, G.; Claudio, P. P. & Giordano, A. Ezh2 reduces the ability of HDAC1-dependent pRb2/p130 transcriptional repression of cyclin A. Oncogene, 2004, 23, 49304937. Tu, S.; Teng, Y.-C.; Yuan, C.; Wu, Y.-T.; Chan, M.-Y.; Cheng, A.-N.; Lin, P.-H.; Juan, L.-J. & Tsai, M.-D. The ARID domain of the H3K4 demethylase RBP2 binds to a DNA CCGCCC motif. Nat Struct Mol Biol., 2008, 15, 419-421. van Bavel CC, Jürgen W Dieker, Yvet Kroeze, Wim P Tamboer, Reinhard Voll, Sylviane Muller, Jo H Berden, Johan van der Vlag. Apoptosisinduced histone H3 methylation is targeted by autoantibodies in systemic lupus erythematosus. Ann Rheum Dis 2011;70:201-207. van Zutven, L. J. C. M.; Onen, E.; Velthuizen, S. C. J. M.; van Drunen, E.; von Bergh, A. R. M.; van den Heuvel Eibrink, M. M.; Veronese, A.; Mecucci, C.; Negrini, M.; de Greef, G. E. & Beverloo, H. B. Identification of NUP98 abnormalities in acute leukemia: JARID1A (12p13) as a new partner gene. Genes Chromosomes Cancer, 2006, 45, 437-446. Wang, G. G.; Song, J.; Wang, Z.; Dormann, H. L.; Casadio, F.; Li, H.; Luo, J.-L.; Patel, D. J. & Allis, C. D. Haematopoietic malignancies caused by April
2012
 176 87. 88. 89. 90. 5. R ELATIVE REQUIREMENT OF dysregulation of a chromatin-binding PHD finger. Nature, 2009, 459, 847-851 Xie, L.; Pelz, C.; Wang, W.; Bashar, A.; Varlamova, O.; Shadle, S. & Impey, S. KDM5B regulates embryonic stem cell self-renewal and represses cryptic intragenic transcription. EMBO J, 2011, 30, 1473-1484 Yu, B. D.; Hanson, R. D.; Hess, J. L.; Horning, S. E. & Korsmeyer, S. J. MLL, a mammalian trithorax-group gene, functions as a transcriptional maintenance factor in morphogenesis. Proc Natl Acad Sci 1998, 95, 10632-10636. Yu, J.; Cao, Q.; Mehra, R.; Laxman, B.; Yu, J.; Tomlins, S. A.; Creighton, C. J.; Dhanasekaran, S. M.; Shen, R.; Chen, G.; Morris, D. S.; Marquez, V. E.; Shah, R. B.; Ghosh, D.; Varambally, S. & Chinnaiyan, A. M. Integrative genomics analysis reveals silencing of beta-adrenergic signaling by polycomb in prostate cancer. Cancer Cell, 2007, 12, 419-431. Chongzhi Zang, Dustin E. Schones, Chen Zeng, Kairong Cui, Keji Zhao and Weiqun Peng. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics (2009) 25 (15): 1952-1958. 91. Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) vol. 9 (9) pp. R137. 92. Zhang Q, Long H, Liao J, Zhao M, Liang G, Wu X, Zhang P, Ding S, Luo S, Lu Q. Inhibited expression of hematopoietic progenitor kinase 1 associated with loss of jumonji domain containing 3 promoter binding contributes to autoimmunity in systemic lupus erythematosus. J Autoimmun. 2011 Nov;37(3):180-9. 93. Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS and Green MR. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237). April
2012
 177 KDM5A SUPPLEMENTARY FIGURES Figure S1: Experimental set-up, public data used in this study and data analysis: Upper figure shows the experimental design, cell conditions, antibodies used for genome-wide location analysis. H3K4m3 ChIP-seq was performed only in undifferentiated ES cells. Expression analysis was done in both conditions. However, expression microarray data were originally previously repoted by another group in collaboration with our group. Here we re -analyzed those data. Bottom figure shows bioinformatic analysis scheme in boxes, and in blue color text, tools/methods/software names are dipected. See methods for detail description, and references for public data please see supplementary Table S1. 178 A mouse B Average Gene Profile C AI Bidirectional Promoter 5. R ELATIVE REQUIREMENT OF LI mouse 1Kb up-stream from TSS, 3Kb meta gene, 1Kb down-stream from TTS <= 2.5 Kb Figure S2: Distribution of KDM5A peaks. (A) Pie chart showing distribution of AI, LI and mouse KDM5A enriched peaks in defined genomic features, for example, promoter, intron, exon, downstream etc. Large number of peak shows in close promoter regions. (B) Average ChIP signal (normalized read count) in 1Kb up-stream from TSS, 3Kb meta genes, and 1Kb down-stream from TSS. Meta gene defines that all genes are normalized to 3Kb size. We calculated average profile using program CEAS (see methods for details). Red line showing profile of top 5% genes, black line consist off profile from all RefSeq genes, where as cyan or violet line showing bottom 5% expressing genes. These profiles show that KDM5A binds to higher expressing genes at TSS. (C) Enrichment of KDM5A peaks in bidirectional promoter of 2.5 Kb size. Enrichment is compare to the proportion of bidirectional regions (2.5 Kb) in the genome as background. P-values determined one-sided binomial test. 179 KDM5A A Average KMT2A (MLL-1) profile around KDM5A peak summit Average Profile Distance from KDM5A peak summit B Figure S3: (A) Average signal of MLL1 around the KDM5A peak summit. Average number of MLL1 reads for each 50bp bins in a window of 2Kb upstream and 2Kb downstream from KDM5A isofom-specific peaks summit is plotted. (B) Expression of targets of KDM5A, MLL1, and overlapping targets of KDM5A and MLL1 with expression of all genes in the microarray is shown in boxplot. Difference in expression between KDM5A and MLL1 is not significant. However, in all other cases differences in expression are significant compared to the average expression of all genes. Also, overlap targets are significantly high in expression compare to KDM5A or MLL1 alone. P-value of significance calculated using Wilcoxon rank-sum statistical test. 180 SUPPLEMENTARY TABLES: Table S1: Public data utilized in this study Data Type ChIP-on-Chip ChIP-on-Chip ChIP-seq ChIP-seq ChIP-seq Antibody/ Molecule LSD1/KDM1A UTX/KDM6A MLL1/KMT2A (c-terminal) EZH2/KMT6 H3K27me3 H3K36me3 Pol-II H3K4me1 H3K27ac p300 RNA (expression) RNA (expression) RNA (expression) Organism Human Human Human Human Mouse Cells/ Cell Line Foreskin fibroblasts Foot and lung fibroblasts REH ES ES Reference [1] [2] [3] [4] [5] 5. R ELATIVE REQUIREMENT OF ChIP-seq Mouse ES [6] Microarray Microarray Microarray Human Mouse Mouse U937 (96h) ES ESC [We used this in average profile of KDM5A with top 5%, bottom 5% gene expression] [7] [8] [9] REFERENCES: 1. Tsai MC, Manor O, Wan Y, Mosammaparast N et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 2010 Aug 6;329(5992):689-93. 2. Wang JK, Tsai MC, Poulin G, Adler AS et al. The histone demethylase UTX enables RB-dependent cell fate control. Genes Dev 2010 Feb 15;24(4):327-32. 3. Guenther MG, Lawton LN, Rozovskaia T, Frampton GM, Levine SS, Volkert TL, Croce CM, Nakamura T, Canaani E, Young RA. Aberrant chromatin at genes encoding stem cell regulators in human mixed-lineage leukemia. Genes Dev. 2008 Dec, 15;22(24):3403-8. 4. Ku M, Koche RP, Rheinbay E, Mendenhall EM, Endoh M, Mikkelsen TS, et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet 2008; 4:1000242. 5. Goldberg AD, Banaszynski LA, Noh KM, Lewis PW et al. Distinct factors control histone variant H3.3 localization at specific genomic regions. Cell 2010 Mar 5;140(5):678-91. 6. Creyghton MP, Cheng AW, Welstead GG, Kooistra T et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 2010 Dec 14;107(50):21931-6. 7. Lopez-Bigas N, Kisiel TA, Dewaal DC, Holmes KB et al. Genome-wide analysis of the H3K4 histone demethylase RBP2 reveals a transcriptional program controlling differentiation. Mol Cell 2008 Aug 22;31(4):520-30. 8. Lin W, Cao J, Liu J, Beshiri ML et al. Loss of the retinoblastoma binding protein 2 (RBP2) histone demethylase suppresses tumorigenesis in mice lacking Rb1 or Men1. Proc Natl Acad Sci U S A 2011 Aug 16;108(33):13379-86. 9. Sharova LV, Sharov AA, Piao Y, Shaik N et al. Global gene expression profiling reveals similarities and differences among mouse pluripotent stem cells of different origins and strains. Dev Biol 2007 Jul 15;307(2):446-59. 181 KDM5A Table S2: Excel file showing KDM5A peaks in bidirectional promoter and statistical significance analysis details. KDM5A peak occurrence in 1Kb bidirectional region promoter was detected. Then, for statistical significance, this number of peaks in bidirectional was compared with peaks bound in 5 Kb up-stream promoter. We used Chi-square test for finding p-values. Promoter size were normalized. Table S3: Excel file showing list of target genes in each distance based peak group, and (A) their absolute expression (Log2); (B) differential expression in Kdm5a-/- KO versus wild type (Kdm5a f/f) condition. Table S4: Excel file showing KDM5A peaks (AI, in human U937 cell line) and associated KDM5A targets that are located in the nucleolus and their expression values. Table S5: AI and LI peaks overlap, and number of peaks located in CpG island AI (7216) LI (3662) SI 64.6 % of LI Only-LI !"#$%&'%()'*% !"#"$#%&'()# 2&'()#34#5%6# 3)7'48# *+,-# .999## :,9;.0<=# +,--,"%&'%.%/'% ./00# .-,.## :+0;9/#<=# !"#$%/'% 1/1+# ,-# :/;9>#<=# Table S6: Excel table showing conservation score (CS) and frequency of CS for KDM5A peak summit region (100bp) Table S7: Excel table showing comparison of motif occurrence for KDM5A peaks in CpG island and KDM5A peaks not located in CpG islands. Table S8: List of significant transcription factors that putatively co-occur in KDM5A peak locations. Table S9: Number of peaks and targets genes in of total five HMTs and HDMs tested; and overlapping peaks/genes number. 182 Chapter 6 COOPERATION BETWEEN DE2F1 AND YKI/SD DEFINES A DISTINCT TRANSCRIPTIONAL PROGRAM NECESSARY TO BYPASS CELL CYCLE EXIT Authors: Brandon N. Nicolay, Battuya Bayarmagnai*, Abul B.M.M.K. Islam*, Nuria Lopez-Bigas and Maxim V. Frolov. *equal contribution Published in: Genes and Development. 2011 February; 25(4): 323-335. Full text: http://genesdev.cshlp.org/content/25/4/323 183 Summary This study was based upon previous observations that inactivation of the RBF and Hippo pathway tumor suppressor proteins leads to a novel phenotype. Because both of these pathways were known to inhibit distinct transcription factors, we hypothesized that the novel phenotype arose from some sort of synergistic activity of the transcription factors, dE2F1 and Yki/Sd. Here I analyzed gene expression microarray data from cells with RBF mutant, Warts mutant and both mutant. I did functional clustering of genes that are characteristicss to each mutant and overlapping to all three mutant. In addition to our microarray data, I have analyzed several publicly available ChIP-on-chip data. Using a computational prediction approach, I have determined binding sites of several transcription factors at the promoters of differentially expressed genes. Also, I have performed expression and functional characterization of E2F targets, putative Sd targets and common targets to illustrate probable cooperation of RBF and Hippo pathways in Drosophila. In this study, for the first time we have delineated novel interactions and cooperation between dE2F1 and Yki/Sd from two tumor suppressor pathways. Subsequently, this cross-talk event of identified putative shared genes that are under control of both pathways was also validated by several experimental techniques by our team. 184 Nicolay BN, Bayarmagnai B, Islam AB, Lopez-Bigas N, Frolov MV. Cooperation between dE2F1 and Yki/Sd defines a distinct transcriptional program necessary to bypass cell cycle exit. Supplemental material. Genes Dev. 2011 Feb 15;25(4):323-335. Chapter 7 MIR-11 LIMITS THE PROAPOPTOTIC FUNCTION OF ITS HOST GENE, DE2F1 Authors: Mary Truscott, Abul B.M.M.K. Islam, Nuria Lopez-Bigas and Maxim V. Frolov. Published in: Genes and Development. 2011, 25: 1820-1834 Full text: http://genesdev.cshlp.org/content/25/17/1820 213 Summary mir-11 is encoded within an intron of the Drosophila E2F1 gene, dE2f1, and is likely co-expressed with dE2f1. In flies, E2F1 is a potent inducer of proliferation and apoptosis, and one consequence of E2F1 overexpression is apoptosis. Conversely, in the absence of endogenous dE2F, cells do not die following irradiation-induced DNA damage. We hypothesized that a normal function of miR-11 is to protect cells from E2F1-induced apoptosis during S phase entry. Since miR-11 is co-expressed with dE2F, an increase in dE2F at the G1/S transition could be accompanied by an increase in miR-11, which we predicted would protect cells from apoptosis in S phase. Microarray experiments were done using RNA from the wild-type eye disc tissue, and from tissue in which miR-11, dE2F1, or both miR-11 and dE2F1 were overexpressed. The idea was that we will learn which pro-apoptotic genes are induced by dE2F and how miR-11 affects their expression. This would tell us about potential targets of miR-11. I used E2F targets from previously published data. miR-11 targets were predicted using various algorithms. I found that gene ontology functional enrichment analysis confirmed what was found in genetic interactions: expression of dE2F1 alone induced both proliferation-and cell death-associated genes. However, following co-expression of miR-11 with dE2F1, cell death genes were no longer overrepresented. The main challenges in the data analysis were finding a suitable algorithm for raw microarray data normalization and differential expression. This is because we were interested in expression data from a clonal cell population which contained a large proportion of wild type cells. In addition, a lot of effort was spent on identification of putative miR-11 targets that would be biologically relevant. I could successfully perform these steps and relevant hypothesis were experimentally tested by our team. 214 Truscott M, Islam AB, Lopez-Bigas N, Frolov MV. mir-11 limits the proapoptotic function of its host gene, dE2f1. Supplementary material. Genes Dev. 2011 Sep 1;25(17):1820-1834. Chapter 8 DEMETHYLATION BY KDM5A CONTRIBUTES TO REPRESSION OF CELL CYCLE GENES DURING DIFFERENTIATION Authors: Michael L. Beshiri, Katherine B. Holmes, William F. Richter, Samuel Hess, Abul B.M.M.K. Islam, Qin Yan, Lydia Plante, Nicolas Gevry, Nuria Lopez-Bigas, William G. Kaelin, Jr. and Elizaveta V. Benevolenskaya. Published in: Manuscript submitted (under review). April 2012. Full text: http://bg.upf.edu/˜khademul/myPaper/ 235 Summary The ability of pRB to restrict the cell cycle is well known. Given KDM5A being associated with pRB/E2F, one would wonder about its recruitment and role in the cell cycle. In this study we sought to determine how KDM5A utilizes its demethylating capability on cell cycle genes during differentiation. We also wanted to know if the demethylation function of KDM5A is non-redundant. It was previously shown that KDM5A and E2F colocalize. We aimed to elucidate at what point of the cell cycle KDM5A is recruited, and whether E2F plays a role in KDM5A recruitment or vice-versa. We focused on the genomic localization of KDM5A in mouse ES cells and human U937 cells at different stages of development. In this study I optimized and performed KDM5A ChIP-seq data analysis in mouse ES cells and also did several downstream analysis to determine the location and distribution of these targets in various genomic features. I classified KDM5A locations based on distances and analyzed microarray expression data to test correlations in binding and expression, and identify KDM5A targets that are activated upon differentiation in absence of KDM5A. Also, I analyzed several public ChIP-seq and ChIP-on-chip data to determine genome-wide overlap and the specific locations of KDM5A and E2F. This overlap of targets and functional clustering approaches pointed to KDM5A recruitment to E2F dependent mitotic genes and functions of common interest. When I coupled these findings to gene expression data, I observed that these genes are repressed, indicating a contribution of KDM5A to cell cycle regulation. However, I have also identified E2F independent recruitment of KDM5A and those interesting target functions. Subsequently, these findings were validated by co-workers of our team using cell culture techniques. 236 Demethylation by KDM5A Contributes to Repression of Cell Cycle Genes During Differentiation Michael L. Beshiri1, 6, Katherine B. Holmes1, 6, William F. Richter1, Samuel Hess1, Abul B.M.M.K. Islam1, 2, Qin Yan3, Lydia Plante4, Nicolas Gévry4, Nuria Lopez-Bigas2, William G. Kaelin, Jr.5, and Elizaveta V. Benevolenskaya1,7. 1 Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, 900 S. Ashland Ave, Chicago 60607, USA. 2 Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, PRBB, Universitat Pompeu Fabra, Doctor Aiguader 88, Barcelona 08003, Spain. 3 Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA. 4 Département de biologie, Faculté des sciences, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada. 5 Department of Medical Oncology, Dana–Farber Cancer Institute and Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA. 6 Equally contributing authors. [7Correspondent: evb@uic.edu] ABSTRACT Epigenetic regulation underlies the robust changes in gene expression that occur during development. One of the enzymes that removes the activating epigenetic mark of trimethylated lysine 4 on histone H3, KDM5A, mediates the effects of RB/E2F tumor suppressor pathway on differentiation. We show that KDM5A has a unique and non-redundant role in histone demethylation. Depletion of KDM5A led to increased methylation at multiple genomic loci that are direct targets of KDM5A, resulting in about a twofold increase in the global level of H3K4me3. Knockout of KDM5A resulted in derepression of these loci during ES cell differentiation, denoting a direct regulatory function. Global location analysis showed that KDM5A co-occupies a substantial portion of E2F4 target genes. In terminally differentiated cells, common KDM5A and E2F4 gene targets were bound by the pRB-related protein p130, a DREAM complex component. KDM5A was recruited to the transcription start site (TSS) region independently of E2F4; however, it cooperated with E2F4 and associated histone deacetylation, to promote a state of deepened repression at cell cycle genes during differentiation. These results are indicative of a critical functional role for KDM5A-regulation of H3K4 methylation, essential to the transcriptional silencing of cell cycle genes during differentiation. ___________________________________________________________________________ INTRODUCTION Regulation of gene expression is accomplished by transcription factors, histone modifying enzymes and chromatin remodeling machinery. Their combined action has been implicated in a number of biological processes, including cell cycle control, DNA damage response, and differentiation. Abrogation of proper regulation by chromatin-modifying enzymes has been strongly linked to the development of cancer. In particular, the enzymes involved in regulating methylation states at the histone H3 lysine 4 (H3K4) residue, MLL1 and KDM5A, have been identified in translocations associated with human leukemia (2, 23, 54). H3K4 histone methylation states exhibit a highly distinct distribution pattern in the genome. H3K4 trimethylation (H3K4me3) is strongly associated with transcriptional activation, with the highest levels observed near transcriptional start sites (4, 8, 38), whereas monomethylation is associated with enhancer regions (21). With the identification of the first histone demethylase LSD1/KDM1A, methylated histone residues were proven to exist in highly dynamic states. Subsequently, the identification of the second class of histone demethylases, JmjC domain-containing proteins, that remove the highest methylated state at the lysine residue revealed that methylation patterns at each particular gene region may depend on the activity of multiple enzymes. KDM5A exemplifies two characteristics of an epigenetic regulator, being both a “reader” and highly specific “eraser” of a histone modification. During mammalian development, H3K4 methylation is critically important at developmental loci and its deregulation causes tumorigenesis. The “reader” function of KDM5A was shown to be critical during haematopoietic differentiation because binding of a KDM5A oncogenic fusion protein to H3K4me3 prevented proper cell fate decision- 8. KDM5A IN REPRESSION OF 237 CELL CYCLE GENES making, resulting in a leukemic phenotype (58). There is a solid line of evidence that all four KDM5 enzymes, KDM5A-D, are able to remove H3K4 methylation in vitro. KDM5A has been shown to specifically demethylate H3K4me2 and me3 residues and forced expression of KDM5A resulted in a global loss of histone H3 methylation (15, 30). KDM5A is functionally conserved in other species (Eissenberg et al. 2007; Lee et al. 2007; Secombe et al. 2007; Lloret-Llinares et al. 2008), and the single Drosophila ortholog of the KDM5 family, Lid (little imaginal discs), is required for larvae development. Lid mutant flies display an increased level of all states of H3K4 methylation (17), suggesting an association of the Lid mutant phenotype with a higher level of H3K4 methylation throughout the genome. The functional importance of KDM5A in the regulation of its target genes is tempered by the existence of paralogs (5), which in some cases may be recruited to the same gene regions as KDM5A (27). in vitro studies suggest that KDM5 enzymes are responsible for the removal of sitespecific methylation, which is at lysine 4 of histone H3. However, in vivo, LSD1 or other JmjC domain-containing proteins may be responsible for removing H3K4 methylation. This leads to several questions: (1) are KDM5 enzymes non-redundant H3K4 demethylases and does some member of the KDM5 family itself play non-redundant role in H3K4 methylation while other KDM5 proteins play redundant role; (2) are they contributing to the balance in methylation states at all their targets and in what methylation state they play a nonredundant role; (3) finally, how does their function relate to gene expression. Mutations in KDM5 orthologs result in developmental defects. This suggests that KDM5 demethylation activity and/or its protein-protein interactions may be critical for transcriptional changes occurring during differentiation. During differentiation, cells exhibit two novel properties: repression of cell cycle genes associated with permanent cell cycle exit and cell type-specific gene activation. Histone modifications are thought to be important epigenetic events intimately linked to initiation and maintenance of transcriptional changes for both of these processes. Cell cycle exit is associated with a repression of a large group of genes, many of which are pRB/E2F targets. Derepression of pRB/E2F target genes plays an important role in human tumors, most of which carry mutations in the RB pathway. During cell cycle progression, pRB and the related proteins p107 and p130 (collectively called “pocket” proteins) are periodically and reversibly recruited to E2F target genes, accompanied by recruitment of histone deacetylase Sin3/HDAC and chromatinremodeling activities (19). In Go/quiescence the repression of cell cycle genes is mediated by the DREAM complex, that contains p130 and E2F4 (35). In myoblasts converting to myotubes, E2F target genes are silenced through E2F/pocket protein–dependent histone deacetylation and methylation of the histone H3 lysine 27 residue (10). In this model system, a portion of E2F target genes acquired Sin3/E2F4 complexes, comprised of Sin3B, HDAC and KDM5A, which were recruited by E2F4 immediately downstream of the TSS (53). This, in turn, affected histone acetylation, methylation and nucleosome occupancy. Surprisingly, the recruitment of such a complex to a target gene occurred before differentiation progression, when cells were arrested by contact inhibition and the target gene was not yet fully repressed, thus leaving further repression events uncovered. However, KDM5A exhibits substantial binding to E2F targets relatively late in differentiation (36) suggesting that E2F factors are likely orchestrating several events associated with cell cycle exit. These data raised the possibility of the contribution of KDM5A and pRB/E2F complexes for either transcription factor recruitment or histone modifications occurring at E2F target genes after induction of differentiation. Here we studied the requirement of KDM5A in H3K4 methylation and gene expression in cells where KDM5A expression level was decreased either by RNA inhibition or by using conditional (floxed) KDM5A allele. Using high resolution ChIP-seq analysis, we found that KDM5A preferentially binds to the TSS regions. We show that KDM5A is recruited to genes experiencing strong repression during the course of differentiation and that these genes are also occupied by the components of DREAM complex - E2F4 and p130. Differentiation time course experiments showed a prominent recruitment of both KDM5A and E2F4 during later stages. Importantly, the recruitment of KDM5A and E2F4 was independent from one another. Specifically, E2F4 knockdown did not abolish KDM5 recruitment and H3K4 demethylation, indicating that the role of KDM5A in gene repression is independent from E2F4. MATERIALS AND METHODS Cell Culture, Reagents for Transfections and Immunoblotting 238 April 2012 KDM5A
contributes
to
repression
of
cell
cycle
genes
 U937 cells were grown in cell suspension at 370C in RPMI 1640 medium (Mediatech) supplemented with 10% fetal bovine serum (FBS) (HyClone) and differentiated with TPA (Sigma). For induction of differentiation, TPA was added to cells at different times as described (6) and cells were collected simultaneously for RNA isolation or ChIP assays. MEFs were isolated from the mice of corresponding genotypes and used to prepare cell lysates at passage 6. MEFs, 293T cells and SAOS2 human osteosarcoma cells were grown in DMEM medium (Mediatech) supplemented with 10% FBS. Nucleofection of U937 cells was performed with Nucleofector II using Nucleofection reagents from kit C or V using programs W-001 or V-001, respectively (Amaxa Biosystems), and SAOS-2 cells were transfected using Lipofectamine 2000 (Invitrogen) with KDM5A siRNA duplex sequences No.4 and control siRNAs GL3 and No.1sc as previously shown (6), or with E2F4 siRNA 5’GGCAGAGAUCGAGGAGCUGdTdT-3’ (Sigma) and siGENOME LIN9 siRNA (M-018918-01, Dharmacon). For stable KDM5A knockdown, low passage, 293T cells at 80% confluency were transfected by Lipofectamine 2000 with lentiviral expression and packaging constructs, pMD2.G, psPAX2 and pLVTHM-shKDM5A-GFP or the control pLVTHM-GFP vector. After 7 hours, the transfection media was replaced with growth media (DMEM/10% FBS), 7 µg/ml polybrene and 5 x 105 U937 cells per ml growth media. Following co-culturing for 24 hours, the transduced U937 cells were resuspended in RPMI/10% FBS in a T75 flask. Protein lysates were prepared in 62 mM Tris-HCl (pH 6.8), 1.5% SDS buffer followed by boiling. 50 µg of protein was loaded onto a 6.25% SDS-PAGE gel. Immunoblotting was performed using our rabbit KDM5A anti-serum 2469, antiH3K4me3 (07-473), anti-H3K4me2 (07-030) and anti-H3K4me1 (07-436) from Upstate Tech., total histone H3 (ab-1791) antibodies from Abcam, and mouse α-tubulin (T9026) antibodies (Sigma). Blots were developed using ECL and images were analyzed using ImageJ 1.42q software. ChIP and Gene Expression Analyses Kdm5af/f (wild-type KDM5A) ES cells were isolated from mouse blastocysts of Kdm5af/f mice which were maintained on a pure C57BL/6 background. Successful Cre-dependent recombination was performed as described (34). The generated Kdm5af/f and Kdm5a-/- ES cells were maintained on mitomycin-treated murine embryonic fibroblasts (MEFs) and expanded for three passages on gelatinized-tissue culture plates without MEFs before proceeding to growing cells for ChIP. Cells were grown under standard mESC conditions (34). ChIP-seq experiments were performed with the KDM5A antibody 1416 following described procedure (9). Two separate Kdm5af/f clones and a single Kdm5a-/- clone were analyzed. Total genomic DNA sample was prepared from DNA of the two Kdm5af/f clones mixed in a 1:1 ratio. Reference DNA was either the total genomic DNA sample or the Kdm5a-/clone ChIP-seq sample. The genomic library was applied to the flow-cell of the Cluster Station from Illumina. After priming, flow cells were loaded into the Illumina Genome Analyzer II. Images acquired from the Illumina/Solexa sequencer were processed through the bundled Solexa image extraction pipeline, which performed base-calling, and generated QC statistics. 9 and 24 million reads were generated for the two Kdm5af/f clones, respectively, 22 million reads were generated for the Kdm5a-/- clone, and 25 million reads for the total genomic DNA. Sequenced 36 bp short reads were uniquely aligned allowing at best two mismatches to the UCSC (The Genome Sequencing Consortium) reference genome (mm9) using the BOWTIE program (version 0.12.2) (33). Sequence matched exactly more than one place with equally quality were discarded to avoid bias. Peak caller algorithm MACS (version 1.3.7.1) (59) was used to determine enriched peak region against reference DNA as background. Data for two Kdm5af/f clones were combined. Peaks were modeled using default parameter with 'futurefdr' flags on with 'mfold' 10. Average ChIP signal along the 3 kb meta-gene (RefSeq) together with 1 kb upstream from TSS and 1 kb downstream from TTS was plotted using CEAS software (48). In the previous ChIP-on-chip experiment, we used a selfprinted array containing PCR generated probes representing the proximal promoter regions spanning between -500bp and +500bp of the transcription start site. ChIP experiments were performed as described (36) using the following rabbit antibodies: affinity purified KDM5A antibodies 1416 and KDM5A anti-serum 2469, H3K4me3 (ab-8580) and total histone H3 (ab1791) from Abcam), H3K4me2 (07-030) and H3K4me1 (07-436) from Upstate Tech., E2F4 (sc1082), p130 (sc-317) and Sin3B (sc-768) from Santa Cruz. To determine transcript levels in U937 cells, RNA was isolated and reverse-transcribed. Real-time PCR was performed using the SYBR Green PCR master mix and iCycler MyiQ system 8. KDM5A IN REPRESSION OF 239 April 2012 CELL CYCLE GENES or SsoFast EvaGreen Supermix and the CFX96 system (Bio-Rad). Detailed information and all primer sequences are provided in Tables S1, S2 and S9. Significance of expression changes in KDM5A genes was approached using Wilcoxon Manwhietny U test. Location Analysis Data and Enrichment Analysis of Targets Enriched probe genomic locations (32) were annotated to the closest transcription start sites (TSS) of EnsEMBL genes (version 55) using Bioconductor (version 2.7, R version 2.12.1) (20) package ChIPpeakAnno (version 1.60) (60). To analyze expression of genes displaying KDM5A binding (ChIP-seq data) in particular regions relative to the TSS, the following steps were performed: 1) gene closest to the peak was determined by ChIPpeakAnno; 2) KDM5A peaks were then classified according to distance from TSS and location of peak summit; 3) differential expression of those genes was plotted. Target genes of KDM5A in U937 and expression data of genes in this cell line were obtained from our previous study (36). Expression data in differentiating Kdm5af/f and Kdm5a-/- ES cells were taken from our another study (34). Expression data in ES cells from C57BL/6 mouse were used from the GEO dataset GSE5914. Genomic location of E2F4 targets was taken from GSE20551 (particularly GSM516408, (29)). E2F4 genomic locations were taken from the authors’ analyzed genomic location BED file and annotated to the closest gene (i.e., TSS) by ChIPpeakAnno (Table S3). Overlap of KDM5A and E2F4 targets was presented using the venn diagram and significance of overlap was determined by standard Chi-square test using R statistical programming language. Significant probes (p value ≤ 0.05) of the targets of EZH2, H3K4me3 and H3K27me3 in U937 cell line were taken from the previous study (42) (GEO accession GSE16730). However, since EZH2, H3K4me3 and H3K27me3 probes were designed spanning chromosomes 2,3,5,7,9,10,11,12,17; the overlap analysis with KDM5A was limited to the KDM5A targets within these chromosomes. Functional annotation of target genes is based on Gene Ontology (GO) (1) (http://www.geneontology.org) as extracted from the EnsEMBL (24) and KEGG pathway database (28). We have taken only the GO/pathway categories that have at least 10 genes annotated. Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method of False Discovery Rate (FDR) (7). Gitools were used for enrichment analysis and heatmap generation (44). Accession Numbers ChIPseq data are available from http://www.ncbi.nlm.nih.gov/geo/ under accession number GSE28343 RESULTS KDM5A is Recruited to Different Sets of Genes Experiencing Demethylation We previously mapped KDM5A/RBP2 binding regions in the human genome during cellular differentiation using ChIP-on-chip experiments in human diffuse histiocytic lymphoma U937 cells. This microarray study identified two functionally distinct classes of genes, where KDM5A binding is differentiation-dependent and where it is differentiation-independent (36). In order to quantify the recruitment of KDM5A during the course of differentiation more accurately, we used real-time PCR analysis (ChIP-qPCR). The groups of genes in our genome-wide study that were bound by KDM5A only in a specific condition, such as in undifferentiated cells (0 hr time point) or after 27 and 96 hrs of differentiation, displayed the highest enrichment at the relevant time point (Fig. 1; for the list of genes, see Table S1). As determined by ChIP-qPCR analysis of these 138 genes, the group of 0 hr-specific KDM5A targets demonstrated on average more than 2-fold enrichment in KDM5A ChIPs prepared from undifferentiated compared to the ChIP assays performed in differentiated (27 or 96 hrs) cells. Similarly, the group of 27 hr-specific KDM5A targets demonstrated more than 3-fold enrichment and the group of 96 hr-specific KDM5A targets demonstrated more than 2-fold enrichment in the relevant condition. Therefore, KDM5A recruitment is significantly different during differentiation at multiple gene loci. KDM5A Has a Unique and Non-redundant Role in Demethylation On the array, KDM5A was associated with genomic regions enriched for H3K4me3 (36). Consistent with KDM5A function as a histone H3K4 demethylase, the differentiation-dependent binding of KDM5A to genomic regions has been 240 April 2012 8. KDM5A IN REPRESSION OF Figure 1. Quantitative Analysis of KDM5A Enrichment at Target Genes during Differentiation. Promoter regions bound by KDM5A only at 0 hr, 27 hr or 96 hr, as well as those bound at 0 and 27 hr, or at 27 and 96 hr were selected from KDM5A ChIP-on-chip analysis (p value < 0.002). Primers were designed to the gene promoter regions (Table S1) and the ChIP-qPCR data for each gene was generated in three different conditions, 0 hr, 27 hr, and 96 hr after U937 cell treatment with TPA. Enrichment is expressed relative to input chromatin, and is representative of two independent experiments (n=2). The genes are grouped from the highest KDM5A enrichment to the lowest. The comparison between different time points shows that all genes display differential enrichment for KDM5A, consistent with ChIP-on-chip data. A few genes from the KDM5A targets at 27 and 96 hr after induction of differentiation are also enriched at 0 hr, which may be due to the high enrichment of KDM5A at these genes in general. The values are the means, n=2. 241 CELL CYCLE GENES KDM5A
contributes
to
repression
of
cell
cycle
genes
 accompanied at least in some cases by a decrease in H3K4 methylation. In addition to the differentially-occupied targets, KDM5A was also bound to a large number of genes in all three studied conditions. If methylation of these genes can still change despite the consistent presence of KDM5A, this would suggest the possibility of modifications in KDM5A demethylase activity or the involvement of other proteins in such conditions. To study this, we selected promoters displaying high enrichment of KDM5A in undifferentiated (0 hr) and differentiated (96 hr) cells (Fig. 2A; for the list of genes, see Table S2) and analyzed their H3K4me3 status in both conditions by ChIP-qPCR. Surprisingly, we found that the level of H3K4 methylation was still changing at these promoters from 0 hr to 96 hr. Specifically, after differentiation the KDM5A bound promoters became generally more methylated, with less than 10% of the regions displaying a marginal decrease in methylation. However, while KDM5A enrichment values were higher at 96 hr compared to 0 hr for the majority of these targets, H3K4 methylation changes were not generally proportional to KDM5A binding. These data suggested that while increased methylation correlates with a decrease in KDM5A binding, other contributing factors determine the robustness of this modification. Therefore, we looked at the extent to which the KDM5A protein affects histone methylation at its target loci. As we previously showed, KDM5A protein levels can be efficiently decreased in SAOS-2 cells treated with KDM5A siRNA (6). We performed ChIP experiments with KDM5A antibodies in cells treated with a KDM5A siRNA and two control siRNAs, GL3 siRNA and a scrambled siRNA, and found a quantifiable decrease in KDM5A binding to its gene targets (Fig. 2B and data not shown). This experiment showed that siRNA knockdown of KDM5A reduces the level of KDM5A binding to its targets. Then we tested H3K4me3 abundance at these targets and found that in all cases, there was an increase in H3K4me3 in cells with KDM5A knockdown compared with control cells (Fig. 2B). To study if histone methylation is increased at the nucleosomal regions not bound by KDM5A, we performed the same experiments on the intergenic region Int20D. The nucleosomes at the Int20D region were not highly methylated at H3K4 in cells treated with either control siRNA or KDM5A siRNA. We noticed that while some genes like RNMTL1 displayed an approximate 4-fold increase in methylation in cells lacking KDM5A, other genes like PPGB and HOXB9, showed less pronounced changes. Indeed, when we extended our analysis to 23 different KDM5A target genes, we found a wide range of enrichment in H3K4me3 (Fig. 2C). Consistent with the results in Fig. 2A, we failed to detect any correlation between changes in H3K4 methylation and either the level of original H3K4me3 or of KDM5A enrichment. We studied which methylation state at H3K4 is affected when the KDM5A level is decreased through RNA inhibition. We analyzed the level of H3K4me3, H3K4me2, H3K4me1 and total histone H3 at these genes in cells treated either with KDM5A siRNA or control siRNA (Fig. 2B). At 19 genes, the level of H3K4me3 increased 5-20 times in cells with KDM5A siRNA, We also found a marginal increase in H3K4me2. These regions normally lack H3K4me1, which we were also unable to detect after KDM5A knockdown (data not shown). The observed increase in H3K4 methylation was not due to the increased total histone H3 level. These data show that KDM5A is required for maintaining tri- and di-methylation of histone H3K4 at the genomic regions bound by KDM5A. A generally large increase in H3K4me3 at KDM5A bound regions was observed when cells were treated with siRNA to KDM5A. We next asked if in addition to H3K4 methylation at KDM5A bound regions, does reduction in KDM5A affect the total level of methylation. We generated a stable KDM5A knockdown in U937 cells using a shRNA lentiviral system or used mouse embryonic fibroblasts (MEFs) isolated from mice with a homozygous or heterozygous deletion of KDM5A. When several KDM5A bound promoters were tested for the level of H3K4me3 using KDM5A shRNA, the results were similar to those in KDM5A siRNA-treated SAOS-2 cells (Fig. 2D and data not shown). As the shRNA and siRNA are targeting different KDM5A regions, this suggests that these are not off-target effects. Besides increased methylation at the KDM5A targets, cells with KDM5A shRNA displayed a small but reproducible increase in the global level of H3K4me3 as determined by immunoblotting analysis (Fig. 2E). The KDM5A shRNA did not affect the level of total histone H3 compared to the vector control. This suggests that KDM5A is necessary for maintaining H3K4me3 at its target loci and that a lower KDM5A level results in an increase in the total level of H3K4me3. MEFs express KDM5B at a high level (data not shown) and still the global level of H3K4me3 was increased almost 2-fold in Kdm5a-/- cells compared to wild-type cells (compare lanes 1 and 3 with lanes 2 and 4 in Fig. 242 April 2012 8. KDM5A IN REPRESSION OF Figure 2. Global and Gene Specific Changes in H3K4 Methylation in Cells Deficient in KDM5A. (A) Analysis of H3K4me3 changes at KDM5A target genes during differentiation. We sought to determine if genes bound by KDM5A experience a change in their methylation state despite consistent binding of KDM5A to their promoter regions. To this end, the regions bound by KDM5A (p value < 0.002) at 0 hr and 96 hr (Table S2) were analyzed for H3K4me3 at 0 hr and 96 hr. The difference in binding ratios of the KDM5A ChIP signals for each region at 96 hr and 0 hr over control genomic DNA was determined using ChIP-on-chip data. These regions are presented in the order from the regions with increased KDM5A binding (>1) to the regions with decreased KDM5A binding (<1). Fold-changed difference in H3K4me3 enrichment at these regions was determined by ChIP-qPCR, where values are the means, n=2. 243 CELL CYCLE GENES (B) SAOS-2 cells transfected with either KDM5A siRNA or control GL3 siRNA were analyzed by ChIP-qPCR for H3K4me3 and KDM5A binding at five random KDM5A target genes as well as on the unbound region Int20D. Results are shown as percent of input, where values are the means ± STR, n=2. A scrambled (sc) RNA oligo gave similar results (data not shown). (C) The level of H3K4 methylation in KDM5A knockdown is not directly proportional to the levels of normal local methylation or KDM5A binding. ChIP-qPCR experiments for H3K4me3 and KDM5A were performed for the indicated genes in SAOS-2 cells and enrichment is shown as a percent of input. In KDM5A siRNA-treated SAOS-2 cells, enrichment in H3K4me3, H3K4me2 and total histone H3 are shown as a fold change difference compared to cells treated with sc KDM5A siRNA. (D) Human U937 cells transduced with either KDM5A shRNA lentiviruses or control lentiviruses were analyzed for H3K4me3 by ChIPqPCR and for the global levels of KDM5A, H3K4me3 and total histone H3 by immunoblotting. The ChIP-qPCR data are shown for the KDM5A target MRPL37 and for the control Int20D as a fold change difference (KDM5A shRNA vs. control shRNA) and represent the means ± STR, n=2. (E) Wild-type MEFs and MEFs with a deletion in the Kdm5a and/or RB/Rb1 locus were analyzed for H3K4 methylation by immunoblotting. H3K4me3 was quantitated relative to α-tubulin that was used as a loading control in D and E. 2E, Fig. S1A and B). The lack of Rb had no noticeable effect on H3K4 methylation. Strikingly, H3K4me3 was increased in Kdm5a+/- cells (compare lanes 1 and 3 with lane 5 in Fig. 2C, Fig. S1A and B), indicating that deletion of only one copy of KDM5A is sufficient to affect methylation on a global scale. These experiments indicate that the level of H3K4 trimethylation is dosage sensitive to the amount of KDM5A, and is not compensated for by KDM5B or other enzymes. KDM5A is Co-recruited with Components of the DREAM Complex at E2F Targets In proliferating cells pRB is inactive (phosphorylated) and E2F1-3 function as transcriptional activators (19). To guide cells out of the cell cycle, pRB is dephosphorylated and sequesters E2F1–3; these RB/E2F complexes bind to E2F target genes, repressing their activity. Aside from activator E2Fs, the E2F protein family contains E2F4-8, and these E2Fs seem to function only as repressors. Among E2F proteins, E2F4 shows the strongest expression in mouse ES cells (29). In quiescent cells, p130/E2F4 is the most prominent pocket protein complex bound to E2Fregulated promoters (3, 46, 50), and more than 800 promoters are bound by E2F4 and p130 as components of the DREAM complex as shown by a global location analysis study (35). We determined KDM5A targets genome-wide in mouse ES cells and studied whether KDM5A is directed to DNA sites occupied by E2F4 complexes. We performed ChIP-seq experiments from two different clones of ES cells with a conditional (floxed) KDM5A allele (Kdm5af/f) and a clone where the KDM5A allele was deleted (Kdm5a-/-). We were unable to detect a significant enrichment in ChIP-seq samples from Kdm5a-/cells (data not shown, raw data provided in GEO GSE28343) confirming high specificity of our KDM5A antibody. In Kdm5af/f cells, overall we detected 3093 regions bound by KDM5A (Table S3). Our data highly overlapped with the previously determined KDM5A binding sites in ES cells (43), taking into consideration that different algorithms were used for sequence alignment and enrichment analysis (BOWTIE and MACS vs. ELAND and QuEST) (Table S4 and Table S5). Relating KDM5A bound regions to known mouse genes showed that KDM5A is preferentially bound at TSS regions (Fig. 3A). These regions showed greater representation in the top 5% most highly expressed genes than in the bottom 5% expressed genes. We analyzed gene expression changes in ES cells caused by the loss of KDM5A by comparing microarray data from Kdm5af/f and Kdm5a-/- cells (34) and found that the genes bound by KDM5A at the TSS (Table S6), were activated in Kdm5a-/- (KO) cells compared to Kdm5af/f (FF) when the cells were prompted to differentiate (yellow boxes in Fig. 3B). In contrast, no significant change was observed in undifferentiated cells (green boxes in Fig. 3B). Wilcoxon rank sum test with continuity correction showed that the activation of genes bound by KDM5A at the TSS was significantly greater (p value < 2.2e-16) than the activation of a random set of genes that were not KDM5A targets (Table S7). Then, we asked how many E2F genes are bound by KDM5A. We compared our KDM5A ChIPseq with E2F4 ChIP-on-chip data (29) that were obtained in mouse ES cells. Strikingly, almost one third of KDM5A targets (990 out 2919) were comprised of E2F4 targets even in proliferating cells (Fig. 3C). p-value of the pairwise comparison (p value < 2.2e-16) using a Chi-square test showed that the overlap has high significance (Table S8). This suggests that E2F4 or KDM5A may be recruiting one another to the common genes. The differentiation-specific 244 245 8. KDM5A IN REPRESSION OF CELL CYCLE GENES Figure 3. KDM5A is Recruited to DREAM Complex Targets. (A) Average profiles near TSS and in genic region for KDM5A in mouse ES cells. Genic regions are presented as a 3 kb-long meta-gene surrounded by 1 kb region upstream TSS and 1 kb downstream region from transcription termination site (TTS). Puncture lines show TSS and TTS. Most KDM5A binding occurs at TSS. High TSS ChIP signal is observed for KDM5A reads in the top 5% expressed genes when compared to the bottom 5% expressed genes (expression data from GEO GSE5914). Data also show continuous ChIP signal of KDM5A reads elsewhere. A drop in the signal occurs near TTS probably due to nucleosome depletion. (B) Differential expression of KDM5A targets in Kdm5af/f and Kdm5a-/- ES cells. Log2 expression values are presented for the classified KDM5A peak regions (Table S6). Distance of peaks was measured from peak summit to TSS and classified as follows: TSS peaks - if peak summit to TSS distance is within -200 bp to +1100 bp; promoter - if peak summit is in region from -3100 to -201 bp upstream of TSS; distal - if peak summit is in region beyond -3101 bp upstream of TSS; gene body - if peak summit is located within +1101 to (TTS+100) bp region; downstream - if distance of peak summit is more than (TTS+100) bp from its closest gene. (C) KDM5A and E2F4 bind to many common genes. The Venn diagram shows the overlap between KDM5A targets (this study) and E2F4 targets (29) in mouse ES cells. Enrichment analysis on target genes of the three compartments of the venn diagram shows relations to GO terms and KEGG pathways. Corrected (FDR) p-values are delineated in a colored heatmap, where red signifies overrepresentation of targets in a particular term. All large GO and KEGG groups with overrepresentation are shown. Also, small developmental GO groups where KDM5A is bound are shown, such as 14 genes are KDM5A targets out of 64 genes included in GO: embryonic development, 5 out of 14 in GO: mesoderm development, 5 out of 15 in GO: neural tube closure, and 7 out of 21 in GO: hippocampus development. (D) ChIP assays, using E2F4 antibodies, of 24 gene targets of KDM5A in differentiating cells containing predicted E2F-binding sites. U937 cells were treated with TPA for 96 hr or left untreated (0 hr). The data are presented as a percent of input (means ± STR, n=2) and include the unbound control Int26E. ChIP assays, using p130 antibodies, are shown for the same genes. Increased enrichment at 96 hr indicates that these genes are targets of the p130/E2F4 complex during differentiation. KDM5A targets were predicted to be highly enriched for genes with RB/E2F binding sites that are involved in cell cycle (e.g., DNA repair, replication, mitosis) (36). To confirm this, we used E2F4 genome-wide ChIP-on-chip data where we functions in transport and in the mitochondrion (e.g., rRNA processing, oxidative phosphorylation, electron transport chain) (Fig. 3C). KDM5A was generally highly enriched at the genes involved in oxidative phosphorylation. While the overlap in cell cycle genes was still statistically significant (36 out 470 cell cycle genes, p value < 1e-6), consistent with our previous ChIP-on-chip analysis, E2F4 was predominantly bound to cell cycle genes without KDM5A (218 out 470 cell cycle genes, p value < 1e-16). Both proteins were found at genes involved in developmental processes such as neural tube closure and hippocampus development. However, E2F4 alone occupied a high number of genes involved in apoptosis (120 out 416), cell differentiation (115 out 498), transcription (521 out 1738) as well as several pathways (KEGG) associated with cancer and p53 signaling. Interestingly, we did not detect a unique functional category that would describe KDM5A-specific genes. While KDM5A occupies some genes without E2F4, these genes have a function that is common to genes regulated by both factors. In proliferating cells, we were able to distinguish the functions of KDM5A/E2F4 cooccupied genes versus the functions of genes annotated peaks to the closest genes and performed gene ontology (GO) enrichment analysis. We found that KDM5A and E2F4 cooccupied a high proportion of genes with bound by each protein separately. Our gene expression analysis showed that a significant change in the expression of KDM5A target genes after KDM5A knockout is observed only during differentiation (Fig. 3B). During differentiation, the proportion of KDM5A targets that have E2F binding sites is highly increased (36), suggesting that at such genes KDM5A joins E2F4 under specific conditions. Therefore, we tested if E2F4 occupies differentiation-specific KDM5A targets in differentiated cells. ChIP-qPCR showed that all regions that were predicted by our transcription factor binding analysis using TRANSFAC database position weight matrixes to contain E2F binding sites (36), displayed high E2F4 binding after 96 hrs of differentiation (Fig. 3D). ChIPqPCR using p130 antibodies showed that p130 is bound to most of these KDM5A targets as well (Fig. 3D). We conclude that in the course of differentiation KDM5A may be co-recruited with E2F/pocket proteins components at DREAM complex targets. KDM5A and E2F4 Independently Contribute to Repression of Cell Cycle Genes during Differentiation 246 Cell cycle genes are some of the most deeply repressed genes, which is crucial for the cell to prevent unwanted proliferation. Treatment of U937 cells with TPA prompted KDM5A recruitment to a set of genes that are E2Fresponsive, including PCNA and NUSAP1, correlating with their repression, withdrawal from the cell cycle, a macrophage-like morphology and expression of corresponding markers (36). This set of KDM5A/E2F targets was highly transcriptionally active, but upon binding of KDM5A, became repressed as a group. In order to better understand the timing of the events that lead to repression at KDM5A targets, as well as the requirement for KDM5A and E2F in this process, we studied the dynamics of expression of the genes NUSAP1 and PCNA after TPA treatment. PCNA acts as a processivity factor for DNA polymerase d, while nucleolar spindle-associated protein (NUSAP1) plays a role in spindle microtubule organization during mitosis. RTqPCR analysis showed that the lowest expression level is not achieved instantaneously, suggesting consecutive recruitment of different repressor complexes (Fig. 4A). The NUSAP1 promoter region is bidirectional, containing the TSS of another gene, OIP5, separated by only 107 nucleotides (human genome assembly NCBI36/hg18) from the NUSAP1 TSS (Fig. 4B). Consistent with the notion that bidirectional genes contain cis-regulatory elements regulating both genes (51), NUSAP1 and OIP5 are co-expressed in the course of TPA treatment (Fig. S2A). We studied dependence of levels of NUSAP1 and OIP5 transcripts at 96 hr and 120 hr, when they are normally very low, on the presence of E2F4 and KDM5A. Loss of E2F4 through siRNA knockdown resulted in higher expression of both NUSAP1 and OIP5 (Fig. 4B), consistent with the repressive effects of E2F4 on its target genes. Knockdown of LIN9, another component of the DREAM complex, resulted in derepression similar to the E2F4 knockdown (Fig. S2B). Loss of KDM5A resulted in even higher derepression than the loss of E2F4 (Fig. 4B). Finally, the combined loss of E2F4 and KDM5A resulted in greater derepression than the loss of each individual factor, reaching 5-fold derepression of the NUSAP1 gene. This suggested that E2F4mediated and KDM5A-mediated repression operate through different mechanisms, resulting in the cumulative effect when combined. To study the temporal requirements of E2F4 and KDM5A recruitment in the observed repression of the NUSAP1 promoter, we performed ChIP-qPCR for E2F4 and KDM5A at the times of no/low repression (0, 6, and 24 hr), intermediate repression (48 hr) and substantial repression (96 hr) (Fig. 4A). The binding of both E2F4 and KDM5A was not much higher than the background level except at the times points when the NUSAP1 promoter was experiencing repression (Fig. 4C). From 24 hr to 48 hr NUSAP1 expression dropped less than 2-fold. This paralleled a 2-3-fold increase in KDM5A binding and an almost 7-fold increase in E2F4 binding. Subsequently, between 48 hr and 96 hrs there was a ten-fold difference in NUSAP1 transcript levels, but we failed to detect a significant change in the level of E2F4 or KDM5A enrichment. Because cells with E2F4 or KDM5A knockdown experience derepression of NUSAP1 and OIP5, this suggests that E2F4 and KDM5A recruitment is priming succeeding repression events. Co-occurrence of KDM5A and E2F4 at a significant number of genomic regions (Fig. 3C) raises a possibility that KDM5A recruits E2F4 or alternatively E2F4 recruits KDM5A to these regions. We studied KDM5A enrichment at the NUSAP1 promoter in cells treated with E2F4 siRNA. If we find at least partial displacement of KDM5A in the absence of E2F4, this would suggest that KDM5A is brought to the region by E2F4. However, using two different KDM5A antibodies in ChIP assays, we did not detect a decrease in KDM5A recruitment in cells lacking E2F4, but even showed an increase (Fig. 4D). Conversely, we did not see any change in E2F4 occupancy in KDM5A siRNA-treated cells (Fig. 4E), suggesting that KDM5A does not play a role in E2F4 binding to the promoter. Several studies failed to detect an interaction between KDM5A and E2F4 (22, 49, 53), and our data show that E2F4 can act antagonistically to the recruitment of KDM5A; in conditions that lack E2F4, KDM5A recruitment to gene increases, thus maintaining repression in the absence of E2F4 through demethylation of H3K4. KDM5A was shown to associate with Sin3B in C2C12 myotube extracts (53). Sin3B functionally interacted with E2F4 on chromatin as a complex and stable association of this complex with targets was mediated by both factors. However, at the NUSAP1 promoter in U937 cells we were unable to detect a significant recruitment of Sin3B (Fig. S2D), which would parallel KDM5A and E2F4 recruitment (Fig. 4C). Moreover, in contrast to the results of the study that showed recruitment of KDM5A to Sin3/E2F4 targets downstream of the TSS (53), our high resolution ChIP-seq analysis detected most KDM5A-bound regions at the TSS (Fig. 3A). This 8. KDM5A IN REPRESSION OF 247 April 2012 CELL CYCLE GENES 248 Figure 4. KDM5A-mediated H3K4 Demethylation represses cell cycle genes during U937 cell differentiation Independent of E2F4. (A) Relative transcript level of NUSAP1 and PCNA at timed intervals after TPA addition, normalized to the level of B2M. (B) Analysis of OIP5 and NUSAP1 induction in vector control cells and shKDM5A cells treated with E2F4 siRNA or control GL3 siRNA. The siRNA treatments did not change the expression level of unrelated genes (Fig. S2C). Gene map of the bidirectional NUSAP1/OIP5 promoter occupied by KDM5A is shown. The cells were induced for differentiation with TPA for 96 hr and 120 hr. (C) The difference in occupancy of the E2F4- and KDM5A-dependent promoter at different time points of differentiation as determined by ChIP. Intergenic unbound region Int20D is shown as a control. (D) E2F4 knockdown results in increased occupancy of KDM5A at the NUSAP1 promoter. The cells were treated with the control GL3 or E2F4 siRNA. The cells were induced for differentiation with TPA for 48 hr and ChIP experiments were performed using the KDM5A antibody 1416. To confirm the obtained result, ChIP experiments were done using the KDM5A antibody 2469, a distinct control region, AURKB, and the cells were first induced for differentiation for 16 hr, and then nucleofected and grown further for 48 hr total. A similar result using 2469 was obtained from the cells that were nucleofected before TPA addition (data not shown). (E) E2F4 binding to the NUSAP1 promoter is unchanged in cells with KDM5A knockdown. ChIP E2F4 experiments were performed in cells treated with the control GL3 or KDM5A siRNA, in parallel with the experiments using the KDM5A antibody 1416 in D. (F) KDM5A knockdown results in increased methylation in differentiating cells. ChIP H3K4me3 experiments were performed in cells treated with the control GL3 or KDM5A siRNA. The cells were either induced for differentiation with TPA for 96 hr or left untreated. (G) E2F4 knockdown results in decreased methylation in differentiating cells. ChIP H3K4me3 experiments were performed in cells treated with the control GL3 or E2F4 siRNA. The nucleosomal occupancy of the NUSAP1 and the Int20D region was performed using ChIP of total histone H3. All cells were induced for 72 hr. (H) Histone acetylation changes at the NUSAP1 promoter after E2F4 loss but not after KDM5A loss. U937 cells were treated with the control GL3, E2F4 siRNA or KDM5A siRNA, induced with TPA for 72 hr and the ChIP experiments were performed with the antibodies to acetylated histone H3. The values are means ± STR, n=3. suggests that KDM5A and E2F4 are independently recruited the same genes. Using KDM5A knockdown and overexpression experiments, we have shown that KDM5A recruitment to the mitochondrial component gene, MFN2, is required for its repression, and that the mechanism of this repression is demethylation of H3K4me3 on nucleosomes at its promoter (36). Here we have studied methylation at the NUSAP1/OIP5 promoter. Treatment of U937 cells with TPA for 96 hr resulted in an approximate 4-fold decrease in H3K4me3 at the promoter, but KDM5A knockdown resulted in 2-fold higher H3K4 methylation at the promoter (Fig. 4F). The 2-fold increase in H3K4me3 in KDM5A knockdown cells at the NUSAP1 promoter was reproducible in separate biological experiments and was also observed at the PCNA promoter occupied by KDM5A (Fig. S2E). Thus, KDM5A contributes to NUSAP1 repression through removal of H3K4 trimethylation. If E2F4 is required for KDM5A function in gene repression then removing E2F4 may increase H3K4 methylation similar to the removal of KDM5A. However, consistent with the increased KDM5A level (Fig. 4D), the H3K4me3 level at the NUSAP1 promoter was actually decreased in cells treated with E2F4 siRNA (Fig. 4G). This was not associated, however, with changes in nucleosome density because it was not accompanied by a decrease in total histone H3 (Fig. 4G). Consistent with E2F4 recruitment of 8. KDM5A IN REPRESSION OF DISCUSSION Accumulating evidence suggests that deep transcriptional repression is executed not by a single repressor but in a multi-step process, with multiple transcription repressors, histone modifying enzymes and chromatin remodeling machinery involved. We show that the function of KDM5A as a H3K4me3 demethylase is critical for transcriptional repression of cell cycle genes in terminally differentiated cells. It works in parallel and cooperates with E2F-mediated repression. We found that KDM5A regulates methylation levels at the majority of its target genes, and its knockdown 249 CELL CYCLE GENES Sin3/HDAC activities, knockdown of E2F4 resulted in increased acetylation at the NUSAP1 promoter, but not at the control genomic region not bound by E2F4 (Fig. 4H). In contrast, no effect on acetylation was observed in KDM5A knockdown (Fig. 4H). Therefore, our studies suggest that KDM5A-mediated repression is independent of histone deacetylation activity associated with E2F4 and Sin3-containing complexes, and involves the demethylation of H3K4me3 positioned close to the TSS. However, the combined regulation by both histone H3K4 demethylation and histone deacetylation results in a cumulative effect that may be critical to achieving a dramatic drop in gene expression level. results in an increase in H3K4 trimethylation and to a lesser extent H3K4 dimethylation, without significantly changing nucleosomal occupancy. KDM5A function in removing histone H3K4 methylation is not redundant with other H3K4 demethylases as lack of KDM5A results in an increase in the global level of histone methylation. Decreasing the level of KDM5A, either by shRNA in a human cancer cell line or using homo- and heterozygote mice, increased the global level of H3K4me3. The majority of histone H3 in proliferating cells is part of chromatin (37), and the global changes can be explained by an increase in H3K4 methylation at the majority of KDM5A targets, which we observed in ChIP assays. Cells are unable to maintain normal histone methylation when lacking KDM5A, despite the enhanced recruitment of the highly homologous protein KDM5B to KDM5A targets (27). This is in sharp contrast to the effect of deletion of any one of the MLL family members that had only a minimal effect on the global levels of H3K4 methylation. MLL1 overexpression increases levels of histone H3K4me2/3 at specific genes, such as HOXA9, yet does not change the global level of H3K4me2/3 (39). During differentiation KDM5A is differentially bound to cell cycle genes, genes for proteins involved in mitochondrial function, and to developmental genes such as the HOX genes (15, 36, 58). Our previous microarray study showed that correlation between recruitment of KDM5A and decreased activity was characteristic for the whole group of genes specifically bound by KDM5A in differentiated condition (36). Here we confirmed that KDM5A is required for gene regulation specifically during differentiation, because in differentiating Kdm5a-/- ES cells KDM5A TSS targets have higher expression compared to wild-type ES cells (Fig. 3B). Recent RNA-seq experiments, which are more quantitative than microarrays, showed that cell cycle genes is the gene group experiencing the greatest decrease during the onset of differentiation (11). Previous genome-wide in silico computational analyses of promoters identified key regulators of human cell cycleregulated genes, with significant enrichment of the E2F, NRF1, NF-Y, and CREB binding motifs in their promoters (18). Our ChIP-on-chip and ChIPseq analysis showed that KDM5A co-occupies a substantial portion of gene targets of E2F4. The monocyte/macrophage differentiation system recapitulated the dynamics of transcription factor binding and histone modifications at such promoters: coincident to the silencing of cell cycle genes, E2F4/p130 and KDM5A were recruited, and the promoter lost H3K4 methylation. Significantly, our study showed that KDM5A and the DREAM complex not only bind independently to the promoters of cell cycle-regulated genes in the repressed state but also independently and actively repress these genes. The overlap between E2F4 and p130 targets was found not to be cell type specific (12), and our experiments in mouse ES cells and in human U937 cells suggest that likewise, the overlap with KDM5A can be generalized to many different cell lineages. Sin3 was shown to correlate with histone H3 acetylation, methylation and nucleosomal positioning at E2F targets. In particular, depletion of Sin3 in differentiated C2C12 myotubes resulted in robust localized increases not only in histone acetylation but also in H3K4me3 levels (53). Studies in C. elegans, Drosophila and mammalian cells showed that the E2F4 protein is involved in the formation of several chromatin complexes. In association with p130 in the DREAM complex, E2F4 is lacking the Sin3 component (35). Here we determined that methylation at E2F4 target genes can be sensitive to the level of KDM5A independent of Sin3. E2F4-mediated repression at such genes still involves histone deacetylation, which is likely to occur through Sin3-independent association of E2F4 with HDAC activity. This places KDM5A gene regulation in the context of collective regulation by other transcription factors recruited to the same genomic regions. Surprisingly, loss of E2F4, KDM5A, or even the two factors simultaneously still allows gene repression to proceed to a large extent. Achieving a sustainable level of repression at cell cycle genes due to multiple contributing repressive factors may explain the good viability of Kdm5a-/- and E2f4-/mice (25, 30). In human retinoblastoma many chromosomal alterations include regions of genes involved in cell cycle regulation (14), suggesting that they are important for tumorigenesis. In cancer cells, expression of NUSAP1 and OIP5 genes is representative of upregulated E2F activity. NUSAP1 has a specific role in proliferating cells. mRNA and protein levels of NUSAP1 peaked at the G2-M cell-cycle transition, associated with microtubule formation, while they abruptly declined after cell division, localizing to nucleoli during interphase (45). Interestingly, the NUSAP1 promoter is also a c-Myc target (26). NUSAP1 was reported as a novel tumorassociated antigen in AML (57). Consistent with these and other data on derepression of E2F targets in leukemia, Rb family genes play a critical role in early hematopoietic progenitors where Rb loss was 250 April 2012 KDM5A
contributes
to
repression
of
cell
cycle
genes
 sufficient to initiate a preleukemic disorder (56). OIP5/MIS18beta mRNA is overexpressed in colorectal and gastric cancer tissues (16), where RB/E2F pathway was shown to be critical for induction of senescence. We propose that the pocket proteins play a central role in cell growth and proliferation because they are able to regulate genes involved in cell cycle progression using several distinct mechanisms (e.g., KMD5A and E2F). While our study shows that multiple KDM5A targets contain nucleosomes differentially methylated at H3K4, it remains to be determined whether at non-E2F KDM5A targets, H3K4 methylation accompanies acetylation and other histone modifications during differentiation. HOX genes are also dramatically changed during terminal differentiation. In particular, HOXA9 is expressed more than 100-fold higher in stem and progenitor cells than in differentiated cell populations (40). H3K4me3 in conjunction with H3K27me3 constitute two histone marks epitomizing the epigenetic state of HOX genes and other developmental genes in stem and progenitor cells. H3K27 methylation is accomplished by recruitment of PRC2, which contains the histone H3K9 and H3K27 methyltransferase EZH2 as the catalytic component, and three other subunits, SUZ12, EED and RBBP4 (13). Nearly all SUZ12 targets overlap with bivalent histone marks in ES cells (31). SUZ12 occupies one third of LSD1 bound promoters (52). Activation of the HOXA9 gene is accomplished through binding of MLL1 and subsequent elimination of the repressive marks H3K9me3 and H3K27me3 (39). Similarly, a KDM5A oncogenic fusion protein decreased H3K27me3 and prevented EZH2 recruitment at the HOXA9 gene (58). Previously we showed that KDM5A targets overlap significantly with H3K4me3 marks, but do not display H3K27me3 or bivalent marks (36). This was in contrast with a study in mouse ES cells, showing that the PRC2 complex recruits KDM5A to a large number of genes, which is required for repression of PRC2 target genes during ES cell differentiation (41). However, we were unable to see a significant overlap between ChIP-seq KDM5A peaks determined in this study and the PRC2 location (data not shown). This result was also consistent with another study, and its conclusion that KDM5A is able to form complexes with the PRC2 component but fails to interact on chromatin (43). In addition, when we compared our KDM5A ChIP-on-chip data to the custom tiling array data GPL8754 (42) that included 105 regions with EZH2 and bivalent marks, we did not find any 8. KDM5A IN REPRESSION OF ACKNOWLEDGMENTS We thank Dr. Larisa Litovchick for communicating her results and for suggestions on improving the manuscript, and Ms. Aleksandra Vilkova for technical assistance. The project was supported by R01CA138631 (E.V.B.) and R01CA076120 (W.G.K.) from the National Institutes of Health. W.G.K. is a Howard Hughes Medical Institute Investigator and a Doris Duke Distinguished Scientist. N.G. holds a Chercheur boursier (junior 1) award from the FRSQ. N.L.-B. acknowledges funding from the Spanish Ministry of Science and education grant number SAF200906954. Q.Y. acknowledges V Scholar Award, the Alexander and Margaret Stewart Trust Fellowship, and CTSA Scholar Award from Yale Center for Clinical Investigation (CTSA Grant UL1 251 April 2012 CELL CYCLE GENES overlap with bivalent marks or EZH2 (data not shown). Therefore, location of KDM5A does not overlap with that of EZH2 at common targets. These data show that KDM5A is not an obligatory component of PRC2 complexes. Recruitment of E2F4 or KDM5A occurs at times when NUSAP1 promoter is only partially repressed. However, the repression still progresses later during differentiation, indicating on both E2F4- or KDM5A-dependent and independent events. Methylation at H3K4 was shown to provide a landing platform for multiple chromatin remodeling and histone modifying enzymes (47). Once recognized by a histone “reader”, it can serve as a bridging molecule to other transcription regulatory complexes. Indeed, KDM5A is capable of transiently associating with both H3K4me3 and multiple co-repressor complexes (55). By comparing the expression levels of KDM5A target genes in cells with KDM5A knockdown or knockout to normal cells, we established a direct functional link between KDM5A binding at the TSS and relevant gene expression. Our observations suggest that KDM5A is functionally important not only because of its regulation of developmental genes but also because of its link to control of cell cycle genes regulated by RB/E2F family members. In particular, we found cooperative repression of KDM5A with the E2F4/p130 component of the DREAM complex. These data indicate that genes bound by KDM5A at TSS region are direct targets of KDM5A-mediated demethylation and suggest its functionally independent role from E2F4 in the coordinated regulation of gene expression of cell cycle genes during cell differentiation. RR024139). A.B.M.M.K.I. is supported by a fellowship from AGAUR of the Catalonian Government, Spain. REFERENCES 1. Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25-29. 2. Ayton, P. M., and M. L. Cleary. 2001. Molecular mechanisms of leukemogenesis mediated by MLL fusion proteins. Oncogene 20:5695-5707. 3. Balciunaite, E., A. Spektor, N. H. Lents, H. Cam, H. Te Riele, A. Scime, M. A. Rudnicki, R. Young, and B. D. Dynlacht. 2005. Pocket protein complexes are recruited to distinct targets in quiescent and proliferating cells. Mol Cell Biol 25:8166-8178. 4. Barski, A., S. Cuddapah, K. Cui, T. Y. Roh, D. E. Schones, Z. Wang, G. Wei, I. Chepelev, and K. Zhao. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129:823837. 5. Benevolenskaya, E. V. 2007. Histone H3K4 demethylases are essential in development and differentiation. Biochem Cell Biol 85:435-443. 6. Benevolenskaya, E. V., H. L. Murray, P. Branton, R. A. Young, and W. G. Kaelin, Jr. 2005. Binding of pRB to the PHD protein RBP2 promotes cellular differentiation. Mol Cell 18:623-635. 7. Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57:289-300. 8. Bernstein, B. E., T. S. Mikkelsen, X. Xie, M. Kamal, D. J. Huebert, J. Cuff, B. Fry, A. Meissner, M. Wernig, K. Plath, R. Jaenisch, A. Wagschal, R. Feil, S. L. Schreiber, and E. S. Lander. 2006. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125:315-326. 9. Beshiri, M. L., A. Islam, D. C. DeWaal, W. F. Richter, J. Love, N. Lopez-Bigas, and E. V. Benevolenskaya. 2010. Genome-wide analysis using ChIP to identify isoform-specific gene targets. J Vis Exp. 10. Blais, A., C. J. van Oevelen, R. Margueron, D. Acosta-Alvear, and B. D. Dynlacht. 2007. Retinoblastoma tumor suppressor protein-dependent methylation of histone H3 lysine 27 is associated with irreversible cell cycle exit. J Cell Biol 179:1399-1412. 11. Brunskill, E. W., H. L. Lai, D. C. Jamison, S. S. Potter, and L. T. Patterson. 2011. Microarrays and RNA-Seq identify molecular mechanisms driving the end of nephron production. BMC Dev Biol 11:15. 12. Cam, H., E. Balciunaite, A. Blais, A. Spektor, R. C. Scarpulla, R. Young, Y. Kluger, and B. D. Dynlacht. 2004. A common set of gene regulatory networks links metabolism and growth inhibition. Mol Cell 16:399-411. 13. Cao, R., and Y. Zhang. 2004. The functions of E(Z)/EZH2-mediated methylation of lysine 27 in histone H3. Curr Opin Genet Dev 14:155-164. 14. Chen, D., B. L. Gallie, and J. A. Squire. 2001. Minimal regions of chromosomal imbalance in retinoblastoma detected by comparative genomic hybridization. Cancer Genet Cytogenet 129:57-63. 15. Christensen, J., K. Agger, P. A. Cloos, D. Pasini, S. Rose, L. Sennels, J. Rappsilber, K. H. Hansen, A. E. Salcini, and K. Helin. 2007. RBP2 belongs to a family of demethylases, specific for tri-and dimethylated lysine 4 on histone 3. Cell 128:10631076. 16. Chun, H. K., K. S. Chung, H. C. Kim, J. E. Kang, M. A. Kang, J. T. Kim, E. H. Choi, K. E. Jung, M. H. Kim, E. Y. Song, S. Y. Kim, M. Won, and H. G. Lee. 2010. OIP5 is a highly expressed potential therapeutic target for colorectal and gastric cancers. BMB Rep 43:349-354. 17. Di Stefano, L., J. A. Walker, G. Burgio, D. F. Corona, P. Mulligan, A. M. Naar, and N. J. Dyson. 2011. Functional antagonism between histone H3K4 demethylases in vivo. Genes Dev 25:17-28. 18. Elkon, R., C. Linhart, R. Sharan, R. Shamir, and Y. Shiloh. 2003. Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 13:773-780. 19. Frolov, M. V., and N. J. Dyson. 2004. Molecular mechanisms of E2F-dependent activation and pRBmediated repression. J Cell Sci 117:2173-2181. 20. Gentleman, R. C., V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Y. Yang, and J. Zhang. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5. 21. Ghisletti, S., I. Barozzi, F. Mietton, S. Polletti, F. De Santa, E. Venturini, L. Gregory, L. Lonie, A. Chew, C. L. Wei, J. Ragoussis, and G. Natoli. 2010. Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. Immunity 32:317-328. 22. Hayakawa, T., Y. Ohtani, N. Hayakawa, K. Shinmyozu, M. Saito, F. Ishikawa, and J. Nakayama. 2007. RBP2 is an MRG15 complex component and down-regulates intragenic histone H3 lysine 4 methylation. Genes Cells 12:811-826. 23. Hess, J. L. 2004. MLL: a histone methyltransferase disrupted in leukemia. Trends Mol Med 10:500-507. 24. Hubbard, T. J., B. L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, L. Clarke, G. Coates, F. April 2012 252 KDM5A
contributes
to
repression
of
cell
cycle
genes
 Cunningham, T. Cutts, T. Down, S. C. Dyer, S. Fitzgerald, J. Fernandez-Banet, S. Graf, S. Haider, M. Hammond, J. Herrero, R. Holland, K. Howe, K. Howe, N. Johnson, A. Kahari, D. Keefe, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, C. Melsopp, K. Megy, P. Meidl, B. Ouverdin, A. Parker, A. Prlic, S. Rice, D. Rios, M. Schuster, I. Sealy, J. Severin, G. Slater, D. Smedley, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, M. Wood, T. Cox, V. Curwen, R. Durbin, X. M. Fernandez-Suarez, P. Flicek, A. Kasprzyk, G. Proctor, S. Searle, J. Smith, A. Ureta-Vidal, and E. Birney. 2007. Ensembl 2007. Nucleic Acids Res 35:D610-617. 25. Humbert, P. O., C. Rogers, S. Ganiatsas, R. L. Landsberg, J. M. Trimarchi, S. Dandapani, C. Brugnara, S. Erdman, M. Schrenzel, R. T. Bronson, and J. A. Lees. 2000. E2F4 is essential for normal erythrocyte maturation and neonatal viability. Molecular cell 6:281-291. 26. Hussain, S., S. B. Benavente, E. Nascimento, I. Dragoni, A. Kurowski, A. Gillich, P. Humphreys, and M. Frye. 2009. The nucleolar RNA methyltransferase Misu (NSun2) is required for mitotic spindle stability. The Journal of cell biology 186:27-40. 27. Islam, A. B., W. F. Richter, N. Lopez-Bigas, and E. V. Benevolenskaya. 2011. Selective targeting of histone methylation. Cell Cycle 10:413-424. 28. Kanehisa, M., M. Araki, S. Goto, M. Hattori, M. Hirakawa, M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu, and Y. Yamanishi. 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res 36:D480-484. 29. Kim, J., A. J. Woo, J. Chu, J. W. Snow, Y. Fujiwara, C. G. Kim, A. B. Cantor, and S. H. Orkin. 2010. A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143:313-324. 30. Klose, R. J., Q. Yan, Z. Tothova, K. Yamane, H. Erdjument-Bromage, P. Tempst, D. G. Gilliland, Y. Zhang, and W. G. Kaelin, Jr. 2007. The retinoblastoma binding protein RBP2 is an H3K4 demethylase. Cell 128:889-900. 31. Ku, M., R. P. Koche, E. Rheinbay, E. M. Mendenhall, M. Endoh, T. S. Mikkelsen, A. Presser, C. Nusbaum, X. Xie, A. S. Chi, M. Adli, S. Kasif, L. M. Ptaszek, C. A. Cowan, E. S. Lander, H. Koseki, and B. E. Bernstein. 2008. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet 4:e1000242. 32. Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921. 33. Langmead, B., C. Trapnell, M. Pop, and S. L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. 34. Lin, W., J. Cao, J. Liu, M. L. Beshiri, Y. Fujiwara, J. Francis, A. D. Cherniack, C. Geisen, L. P. Blair, M. R. Zou, X. Shen, D. Kawamori, Z. Liu, C. Grisanzio, H. Watanabe, Y. A. Minamishima, Q. Zhang, R. N. Kulkarni, S. Signoretti, S. J. Rodig, R. T. Bronson, S. H. Orkin, D. P. Tuck, E. V. Benevolenskaya, M. Meyerson, W. G. Kaelin, Jr., and Q. Yan. 2011. Loss of the retinoblastoma binding protein 2 (RBP2) histone demethylase suppresses tumorigenesis in mice lacking Rb1 or Men1. Proceedings of the National Academy of Sciences of the United States of America 108:13379-13386. 35. Litovchick, L., S. Sadasivam, L. Florens, X. Zhu, S. K. Swanson, S. Velmurugan, R. Chen, M. P. Washburn, X. S. Liu, and J. A. DeCaprio. 2007. Evolutionarily conserved multisubunit RBL2/p130 and E2F4 protein complex represses human cell cycle-dependent genes in quiescence. Mol Cell 26:539-551. 36. Lopez-Bigas, N., T. A. Kisiel, D. C. Dewaal, K. B. Holmes, T. L. Volkert, S. Gupta, J. Love, H. L. Murray, R. A. Young, and E. V. Benevolenskaya. 2008. Genome-wide analysis of the H3K4 histone demethylase RBP2 reveals a transcriptional program controlling differentiation. Mol Cell 31:520-530. 37. Loyola, A., T. Bonaldi, D. Roche, A. Imhof, and G. Almouzni. 2006. PTMs on H3 variants before chromatin assembly potentiate their final epigenetic state. Molecular cell 24:309-316. 38. Mikkelsen, T. S., M. Ku, D. B. Jaffe, B. Issac, E. Lieberman, G. Giannoukos, P. Alvarez, W. Brockman, T. K. Kim, R. P. Koche, W. Lee, E. Mendenhall, A. O'Donovan, A. Presser, C. Russ, X. Xie, A. Meissner, M. Wernig, R. Jaenisch, C. Nusbaum, E. S. Lander, and B. E. Bernstein. 2007. Genome-wide maps of chromatin state in 8. KDM5A IN REPRESSION OF 253 April 2012 CELL CYCLE GENES pluripotent and lineage-committed cells. Nature 448:553-560. 39. Milne, T. A., J. Kim, G. G. Wang, S. C. Stadler, V. Basrur, S. J. Whitcomb, Z. Wang, A. J. Ruthenburg, K. S. Elenitoba-Johnson, R. G. Roeder, and C. D. Allis. 2010. Multiple interactions recruit MLL1 and MLL1 fusion proteins to the HOXA9 locus in leukemogenesis. Mol Cell 38:853-863. 40. Novershtern, N., A. Subramanian, L. N. Lawton, R. H. Mak, W. N. Haining, M. E. McConkey, N. Habib, N. Yosef, C. Y. Chang, T. Shay, G. M. Frampton, A. C. Drake, I. Leskov, B. Nilsson, F. Preffer, D. Dombkowski, J. W. Evans, T. Liefeld, J. S. Smutko, J. Chen, N. Friedman, R. A. Young, T. R. Golub, A. Regev, and B. L. Ebert. 2011. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144:296-309. 41. Pasini, D., K. H. Hansen, J. Christensen, K. Agger, P. A. Cloos, and K. Helin. 2008. Coordinated regulation of transcriptional repression by the RBP2 H3K4 demethylase and Polycomb-Repressive Complex 2. Genes Dev 22:1345-1355. 42. Paul, T. A., J. Bies, D. Small, and L. Wolff. 2010. Signatures of polycomb repression and reduced H3K4 trimethylation are associated with p15INK4b DNA methylation in AML. Blood 115:3098-3108. 43. Peng, J. C., A. Valouev, T. Swigut, J. Zhang, Y. Zhao, A. Sidow, and J. Wysocka. 2009. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell 139:1290-1302. 44. Perez-Llamas, C., and N. Lopez-Bigas. 2011. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 6:e19541. 45. Raemaekers, T., K. Ribbeck, J. Beaudouin, W. Annaert, M. Van Camp, I. Stockmans, N. Smets, R. Bouillon, J. Ellenberg, and G. Carmeliet. 2003. NuSAP, a novel microtubule-associated protein involved in mitotic spindle organization. The Journal of cell biology 162:1017-1029. 46. Rayman, J. B., Y. Takahashi, V. B. Indjeian, J. H. Dannenberg, S. Catchpole, R. J. Watson, H. te Riele, and B. D. Dynlacht. 2002. E2F mediates cell cycle-dependent transcriptional repression in vivo by recruitment of an HDAC1/mSin3B corepressor complex. Genes & Development 16:933-947. 47. Shilatifard, A. 2006. Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. Annu Rev Biochem 75:243-269. 48. Shin, H., T. Liu, A. K. Manrai, and X. S. Liu. 2009. CEAS: cis-regulatory element annotation system. Bioinformatics 25:2605-2606. 49. Silverstein, R. A., and K. Ekwall. 2005. Sin3: a flexible regulator of global gene expression and genome stability. Curr Genet 47:1-17. 50. Smith, E. J., G. Leone, J. DeGregori, L. Jakoi, and J. R. Nevins. 1996. The accumulation of an E2F-p130 transcriptional repressor distinguishes a G0 cell state from a G1 cell state. Mol Cell Biol 16:69656976. 51. Trinklein, N. D., S. F. Aldred, S. J. Hartman, D. I. Schroeder, R. P. Otillar, and R. M. Myers. 2004. An abundance of bidirectional promoters in the human genome. Genome Res 14:62-66. 52. Tsai, M. C., O. Manor, Y. Wan, N. Mosammaparast, J. K. Wang, F. Lan, Y. Shi, E. Segal, and H. Y. Chang. 2010. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329:689-693. 53. van Oevelen, C., J. Wang, P. Asp, Q. Yan, W. G. Kaelin, Jr., Y. Kluger, and B. D. Dynlacht. 2008. A role for mammalian Sin3 in permanent gene silencing. Mol Cell 32:359-370. 54. van Zutven, L. J., E. Onen, S. C. Velthuizen, E. van Drunen, A. R. von Bergh, M. M. van den HeuvelEibrink, A. Veronese, C. Mecucci, M. Negrini, G. E. de Greef, and H. B. Beverloo. 2006. Identification of NUP98 abnormalities in acute leukemia: JARID1A (12p13) as a new partner gene. Genes Chromosomes Cancer 45:437-446. 55. Vermeulen, M., H. C. Eberl, F. Matarese, H. Marks, S. Denissov, F. Butter, K. K. Lee, J. V. Olsen, A. A. Hyman, H. G. Stunnenberg, and M. Mann. 2010. Quantitative interaction proteomics and genomewide profiling of epigenetic histone marks and their readers. Cell 142:967-980. 56. 56. Viatour, P., T. C. Somervaille, S. Venkatasubrahmanyam, S. Kogan, M. E. McLaughlin, I. L. Weissman, A. J. Butte, E. Passegue, and J. Sage. 2008. Hematopoietic stem cell quiescence is maintained by compound contributions of the retinoblastoma gene family. Cell Stem Cell 3:416-428. 57. Wadia, P. P., M. Coram, R. J. Armstrong, M. Mindrinos, A. J. Butte, and D. B. Miklos. 2010. Antibodies specifically target AML antigen NuSAP1 after allogeneic bone marrow transplantation. Blood 115:2077-2087. 58. Wang, G. G., J. Song, Z. Wang, H. L. Dormann, F. Casadio, H. Li, J. L. Luo, D. J. Patel, and C. D. Allis. 2009. Haematopoietic malignancies caused by dysregulation of a chromatin-binding PHD finger. Nature 459:847-851. 59. Zhang, Y., T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nussbaum, R. M. Myers, M. Brown, W. Li, and X. S. Liu. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9:R137. 60. Zhu, L. J., C. Gazin, N. D. Lawson, H. Pages, S. M. Lin, D. S. Lapointe, and M. R. Green. 2010. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11:237. 254 April 2012 SUPPLEMENTAL FIGURE 8. KDM5A IN REPRESSION OF Figure S1. H3K4me3 is increased in KDM5A knockout cells. (A) Quantitation of H3K4 methylation in cells deficient in KDM5A, Kdm5a-/-. H3K4me3 data were normalized to a-tubulin as a loading control using ImageJ software. (B) MEFs of the indicated genotypes were analyzed by immunoblotting. The cells were grown and processed independently of cells in Fig. 2E. 255 CELL CYCLE GENES Figure S2. Effect of gene knockdowns on gene expression. (A) NUSAP1 and OIP5 are both downregulated during the course of TPA treatment. (B) In differentiating cells, the effect of LIN siRNA on NUSAP1 expression is similar to the effect of E2F4 siRNA. (C) Relative transcript level of KDM5A, E2F4 and LIN9 genes in control cells and cells treated with siRNAs to E2F4, LIN9, and shRNA to KDM5A. (D) Sin3B is not significantly bound to the NUSAP1 promoter at the time of E2F4 and KDM5A binding. (E) KDM5A knockdown results in increased methylation at both the PCNA and NUSAP1 promoters. Analysis in A-C was performed at timed intervals after TPA addition, normalized to the level of B2M. In D and E, results are shown as a percent of input. All values are means ± STR (n=3). 256 SUPPLEMENTAL TABLE LEGENDS Table S1. List of differentially bound KDM5A target genes in U937 cells. 100, 010, and 001 indicate genes bound by KDM5A only at 0 hr, 27 hr, or 96 hr after cell treatment with TPA, respectively. 110 means that KDM5A is bound to these genes at 0 hr and 27 hr, and 011 means that KDM5A is bound to these genes at 27 hr and 96 hr. No bound genes were reproducibly detected at 0 hr and 96 hr, but no at 27 hr. Table S2. List of genes displaying enrichment of KDM5A in both undifferentiated (0 hr) and differentiated (96 hr) cells. Table S3. List of KDM5A targets determined by ChIP-seq in mouse ES cells. Table S4. Comparison of our ChIP-seq data to KDM5A/JARID1A ChIP-seq data in the previous study (Peng et al., 2009). Table S5. List of overlapping ChIP-seq KDM5A/JARID1A peaks with the previous study (Peng et al., 2009). The data from Peng et al. study are highlighted in yellow. Table S6. Classification KDM5A peaks based on expression in Kdm5a knockout cells. Table S7. Wilcoxon Manwhietny U test for gene expression changes in Kdm5a knockout cells during differentiation. Comparison with the gene expression changes of non-KDM5A targets is shown for each set of KDM5A targets. Table S8. Chi-square test on overlap between KDM5A and E2F4: 2x2 contingency table. Table S9. Primers used in qPCR. 257 CELL CYCLE GENES ** Supplemental tables are available in online, 8. KDM5A IN REPRESSION OF Chapter 9 KDM5A REGULATES DIFFERENTIATION THROUGH TWO DISTINCT MECHANISMS ON CELL CYCLE AND TISSUE-SPECIFIC GENES Authors: Michael L. Beshiri*, Abul B.M.M.K. Islam*, Nuria Lopez-Bigas and Elizaveta V. Benevolenskaya. *equal contribution Published in: (manuscript in preparation) April 2012. Full text: http://bg.upf.edu/˜khademul/myPaper/ 259 Summary pRB function is known to be abrogated in most tumor cases. pRB and RBP2 physically interact, and pRB mutants that do not form complexes with KDM5A are unable to induce differentiation. Early in differentiation, KDM5A works primarily in opposition to pRB by maintaining the proliferative state. Our group has previously shown by transient KDM5A knockdown in Rb1−/− cells that lowering KDM5A protein levels increases transcription factor activity and gene expression associated with differentiation, and induced characteristic morphological changes reminiscent of the effects seen by reintroducing pRB. Later in differentiation KDM5A binds to the promoters of genes targeted by pRB and E2F. Additionally, in mice, loss of Kdm5a suppresses tumorigenesis caused by the deletion of Rb1. To further address the epistatic relationship between pRB and KDM5A during differentiation and to eliminate confounding effects that may arise due to altered function of low levels of KDM5A in knockdown models our team generated knockout mice. We can induce differentiation in mouse embryonic fibroblasts (MEFs) down the myogenic lineage into myotubes by overexpressing MyoD. Therefore, in this study we sought to determine KDM5A function in the absence of pRB. Here I analyzed KDM5A ChIP-seq data in mouse ES cells, also H3K4me3 occupancy data (ChIP-seq) in the presence and absence of Kdm5a−/− . In addition to our ChIP-seq data, I have analyzed several publicly available ChIP-seq data and compared binding locations among them. To determine genes that are differentially expressed in Rb1 and Kdm5a−/− MEFs compare to their normal counterparts, I optimized and utilized a newly developed RNA-seq technique. To understand co-occurrence of putative transcription factors in KDM5A binding locations and KDM5A binding site conservation, I used a computational approach. I also have done genomic location overlap analysis, functional clustering and overrepresentation (enrichment) analysis of gene ontology and pathway modules in differentially expressed genes occupied by KDM5A. Interestingly this analysis showed that KDM5A binds at distinct locations in distinct functional categories of genes. These findings were then further supported through experimentation by our team. 260 ARTICLE 
 
 KDM5A
 regulates
 differentiation
 through
 two
 distinct
 mechanisms
on
cell
cycle
and
tissue­specific
genes
 Michael L. Beshiri1, * Abul B.M.M.K. Islam1, 2, *, Nuria Lopez-Bigas2 and Elizaveta V. Benevolenskaya1, 3. Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, 900 S. Ashland Ave, Chicago 60607, USA. 2 Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, PRBB, Universitat Pompeu Fabra, Doctor Aiguader 88, Barcelona 08003, Spain. * Contributed equally; 3 Corresponding author, e-mail: evb@uic.edu 1 9. KDM5A REGULATION OF The methylation at H3K4 marks is one of the earliest developmental decisions. The removal of histone methylation is accomplished by histone demethylases (KDMs). KDMs appear in two superfamilies, the amine oxidases, of which the best studied is LSD1, and the much more numerous jumonji domain (jmjC) containing proteins. The jmj proteins specific for H3K4 are the KDM5 family. In mammals, histone H3K4 methylation can be removed by a single LSD1 protein specific to mono and dimethylated state and four different proteins of the KDM5 family that are capable of removal upto three methylated states. The trimethylation of H3K4 marks transcription start sites of actively transcribed genes, whereas monomethylation of H3K4 marks enhancers of tissue-specific and stimulusresponsive genes (Barski et al., 2007; Bernstein et al., 2006; Ghisletti et al., 2010; Mikkelsen et al., 2007). Mutations in the homologs of the histone H3K4 demethylases of LSD1 and KDM5 families result in developmental abnormalities in all organisms, from C. elegans and Drosophila, to humans. Loss of LSD1 in mouse is lethal and the embryonic stem cells (ESCs) from these mice fail to differentiate into embryoid bodies(Foster et al., 2010; Wang et al., 2009; Wang et al., 2007). While loss of KDM5A does not lead to embryonic lethality, the ESCs lacking KDM5A have an increase in global level of H3K4me3 (our submitted manuscript) (Klose et al., 2007). These results suggest that KDM5A 261 April 2012 DIFFERENTIATION Differentiation involves activation of cell type-specific transcription program through recruitment of program-specific transcription factors to enhancer regions. In a process parallel to the activation of developmental genes, differentiation requires initiation of the repression program on pRB/E2F target genes. Previously we identified the histone demethylase KDM5A, which demethylates histone H3 on Lys 4 (H3K4), as an pRBinteracting protein critical for pRB function during differentiation. Here we show that KDMA occupies promoters of cell cycle genes and enhancers of developmental genes that are destined for repression during differentiation. The mechanism includes H3K4 demethylation, which is reversed by interaction with the program-specific transcription factor. In contrast, it dissociates from the enhancers of the developmental genes are getting methylated at H3K4 and induced during differentiation. Induction of differentiation in mouse embryonic fibroblasts (MEFs) lacking pRB and KDM5A activity down the myogenic or neuronal lineage results in sustained expression of cell cycle genes, activation of transcription programs of multiple cell types and inability for timely repression of earlier successive gene activation events associated with this lineage. We propose that in proliferating cells KDM5A is required for maintaining repressed state of developmental genes. In the presence of differentiation signal, KDM5A is replaced by transcription factor and recruited to cell cycle genes, which is essential for high specificity and irreversibility of cell fate, and for permanent withdrawal from the cell cycle. KDM5A regulates differentiation contributes to chromatin changes that are critical for transcription program during differentiation. Strikingly, KDM5A function is connected to protection cells from differentation in the absence of a differentiation signal in an RB-dependent manner (Benevolenskaya et al., 2005b). pRB and RBP2 physically interact, and pRB mutants that do not form complexes with RBP2 are unable to induce differentiation. Our group has previously shown that lowering RBP2 protein levels in Rb1-/- cells by transient RBP2 knockdown increased transcription factor activity, restored gene expression associated with differentiation, and induced characteristic morphological changes reminiscent of the effects seen by reintroducing pRB (Benevolenskaya et al., 2005b). We adopted differentiation assays in MEFs in order to study requirement for KDM5A during cell fate transitions. MEFs can be induced to differentiate down the myogenic lineage into myotubes by ectopic expression of MyoD in low serum media (Figure 1A). Using size fractionation procedure, we were able to obtain a pure population of myotubes, containing multiple nuclei and expressing myogenic differentiation markers (Figure 1B). We isolated polyadelylated (poly(A)) RNA and performed RNAseq experiment. Our analysis revealed that most of myogenic genes show detectable RNA signal. Compared to a narrow range of detection of myogenic transcription program by microarrays, the RNA seq showed that RNA species for majority of myogenic genes are induced at a high level (Table S1). Specifically, 16,000 reads were obtained for myogenin (MyoG) and 70,000 reads for troponin I (TNNI1). Consistent with withdrawal of these cells from the cell cycle, genes encoding activator E2F1-3 proteins reached only 100-200 reads and cyclindependent kinase inhibitors have reached 26,000 for CDKN1A (p21) and 24,000 for CDKN1C (p57). MEFs that are null for the Rb1 gene are deficient in differentiation. They are unable to fully upregulate the markers of myogenic differentiations, and do not exit the cell cycle and properly fuse to form multinucleated myotubes (Novitch et al., 1996). Inability of cells to execute the full differentiation potential is a hallmark of cancer, thus Rb1 null cells represent a model for studying differentiation regulators with oncogenic properties that act downstream or parallel to pRB. In particular, loss of Kdm5a by knockdown or knockout, rescues the expression of myosin heavy chain MyHC, fusion and multinucleation in Rb1 null cells, but also rescues tumor progression and extends survival of Rb1+/mice (Benevolenskaya et al., 2005a; Lin et al., 2011b). To understand the rescue effects of decreased KDM5A level better, we generated myotubes from Kdm5a-/-Rb1-/- MEFs (DKO) and performed genome wide analysis by RNA-seq (Figure 1A). We previously determined the genomic regions bound by KDM5A in the genome by using chromatin immunoprecipitation coupled with massively parallel DNA sequencing, ChIP-seq (our submitted manuscript). Analysis of the distance of the regions bound by KDM5A to the closest annotated gene showed that KDM5A is generally associated with transcription start site (TSS) regions (Figure 1C). A half of the KDM5A peaks were closer than 1 kb from the TSS (core promoter), 30% KDM5A peaks were between 1 kb and 2 kb from the TSS, and 20% of peaks were positioned further upto 50 kb from the nearest TSS. The analysis of gene expression revealed that KDM5A peaks are annotated to a substantial portion of genes that change in expression in Kdm5a knockout (KO). We previously showed that during differentiation KDM5A is recruited and represses cell cycle genes (LopezBigas et al., 2008b). The differentiation-dependent binding sites for KDM5A, which identified cell cycle gene regions, were determined in promoter-based ChIP-on-chip arrays. As our ChIP-seq analysis showed that many more distant KDM5A bound regions exist, we asked what biological processes these genes may be involved in. Gene ontology (GO) enrichment analysis showed that the core promoter group of KDM5A targets that differentially expressed in DKO myotubes are enriched in genes involved in DNA synthesis and cell cycle (Figure 1D, Figure S3-5). In contrast, the distal KDM5A targets are enriched in developmental gene ontology categories. To confirm KDM5A binding to these regions in MEFs, we performed ChIP experiments in wild-type and Kdm5a-/- MEFs. We found that this surprising observation fold in MEFs as well: KDM5A showed 5-fold enrichment at the cell cycle genes CENPE and NUSAP1 and 10-15-fold enrichment at the developmental genes MSTN, GLIS2 and COL3A1 (Figure 1E). The density of KDM5A reads around TSS was higher than at distal regions (Figure 1F). We previously showed that KDM5A peaks at TSS corresponded to highly expressed genes in ESCs. Consistent with these data, KDM5A peaks showed high frequency at TSS of cell cycle genes (Figure 1G). However, KDM5A peaks at developmental gene regions were present at the lower frequency in distal regions. Thus, we determined which genes require KDM5A for transcriptional changes during differentiation. Strikingly, we found that the genes bound by KDM5A at the core promoters are enriched for cell cycle and apoptosis. In contrast, genes that have KDM5A binding located distally, belong to different developmental and differentiation categories. Because we found many cell cycle genes as KDM5A targets, we asked how many of genes occupied by E2F transcription factors are deregulated in DKO. A surprisingly large number (2130 out 4562) of genes bound by E2F1 at ≤ 1kb from the TSS were differentially expressed in DKO cells compared to wild-type cells (Figure 2A). April 2012 262 Figure 1 Figure 1. KDM5A binds its target genes near the TSS and at regions distal to the TSS. (A) Purification of myotubes from undifferentiated MEFs. MEFs were transduced with adenoviral MyoD and induced for differentiation for 72 hours, after which myotubes were purified from a mixed population that contained undifferentiated and differentiated cells. Following trypsinization small cells stay in cell suspension and can be removed, while the cells that underwent fusion events and formed myotubes are big enough to promptly sediment. (B) Microscopic images of cells were taken before and after fractionation. (C) The distribution of KDM5A binding peaks in relation to the TSS. (D) DE/KDM5A targets are enriched for GOBP terms associated with cell cycle and development. Enrichment analysis was done for biological process using three gene lists as input. Results for each gene list are shown in 3 columns. Rows represent enriched GO terms. Far left column: DE(≥ 4 fold)/KDM5A targets. Center left column: DE(≥ 4 fold)/KDM5A targets where KDM5A binds at ≤ 1kb from the TSS. Center right column: DE(≥4 fold)/KDM5A targets where KDM5A binds at > 1kb from the TSS. The degree of significance is indicated by color: highest significance (red) to least (orange). (E) Enrichment of KDM5A at cell cycle and developmental genes. ChIP experiments performed in MEFs with KDM5A antibodies. The enrichment is presented as fold-change values relative to Kdm5a-/- MEFs. Igr4070 is an intergenic control region not bound by KDM5A. (F) All KDM5A peaks were separated into 12 groups based on their location within defined windows of distance from the TSS. Each group was plotted as the number of peaks within the given window of distance. (G) Chart showing the distribution of KDM5A peaks associated with cell cycle genes compared to KDM5A peaks associated with developmental genes (genes are ≥4 fold DE). Peaks at genes associated with cell cycle or development were separated and plotted as in F. 9. KDM5A REGULATION OF 263 DIFFERENTIATION Therefore, the E2F direct targets are deregulated in cells lacking pRB and KDM5A, consistent with the role of both proteins in decreased expression of cell cycle genes in terminally differentiated cells (our submitted manuscript). We analyzed KDM5A targets for binding sites of other transcription factors and found that GADP, SAP1A, GABP, ELK1, and YY1 were enriched besides E2F (Figure 2B, Table S5, Figures S6 and S8). Strikingly, when only KDM5A regions > 1kb were taken in to the analysis, a distinct set of transcription factors was identified, including HNF3-4, EVI1 and FOXJ2 (Figure 2C). Comparison with published MyoD data showed that many KDM5A peaks associate with the same genes (Figure S2). These results indicate that KDM5A peak regions at the TSS and distant sites are functionally different. Given the KDM5A function as a histone demethylase, we analysed KDM5A regions for the overlap with H3K4me3 and H3K4me1. For differentially expressed genes, 253 out 302 KDM5A ChIP-seq peaks were within an H3K4me3 peak (Figure 2D, Figure S9 and S10). 49 out these 302 had H3K4me1. Inspection of the distal regions for well-characterized marks of enhancer regions, such as location of H3K4me1, H3K27ac and p300, showed that KDM5A occupies the enhancers of a significant number of DE genes (Figure 2E, Figure S7). Comparison of individual gene tracks showed that KDM5A peaks overlap or in close proximity to the peaks of H3K4me1, H3K27ac and p300, but not to the H3K4me3 peaks (Figure 2F). These regions are highly conserved in the placental animals and mammals and DNase hypersensitive, which is characteristic of enhancer regions. Despite the ability of Kdm5a-/-Rb1-/- cells to form myotubes, their morphology was different from wild-type myotubes. The gene expression levels for myogenin, troponin I and genes associated with GO “muscle cell differentiation”, or E2F1-3, p21, p57, and other cell cycle genes (Table S1 and S4) were significantly more different between the replicate samples of DKO and wild-type, than between the samples of the same genotype. To further investigate the phenotype of “rescued” Kdm5a-/Rb1-/- myotubes, we analyzed them for the protein expression markers of terminally differentiated cells. Myogenin is a tissue-specific transcription factor that is upregulated early during myogenesis, and is required for terminal differentiation of committed myoblasts (Charlotte A. Berkes, 2005). We induced wild-type (WT), Kdm5a-/-, Rb1-/- and Kdm5a-/-; Rb1-/- MEFs to differentiate over a course of 2 days and determined the dynamics of myogenin expression by immunostaining at 0, 6, 12, 18, 24, 36, 48 and 144-hour time points (Figure 3A). WT cells have reached high expression level of myogenin at 18 hr. Rb1-/- cells were mostly defective for myogenin expression, and cells with the additional loss of Kdm5a has rescued this defect. In the absence of KDM5A, either in KO or in DKO cells, robust expression of myogenin first appeared by the 12 hour time point, and was significantly higher, that of the WT cells (Figure 3B and C). In addition to early upregulation of myogenin, we also observed that the Kdm5a-/- and Kdm5a-/-;Rb1-/- cells initiated cell fusion events sooner than the WT cells (data not shown). Multinucleation (defined here as a cell containing at least 3 nuclei) first appeared 24 hours after induction of differentiation. At this time point we saw several KO and DKO multinucleated cells but very few multinucleated WT cells. Very similar results were obtained in cells of the same genotypes derived from another litter. These results suggest that KDM5A may be required during differentiation to silence the genes for the complete shutdown. Differentiation requires intact process of DNA damage response and apoptosis, and a number of molecules have been identified that prevent apoptosis in differentiated cells. We found that compared to wild-type and Rb1-/- cells, DKO cells have higher level of apoptosis (Figure 4A and 4B). Permanent withdrawal from the cell cycle is another feature of terminally differentiated cells. It has been shown that in differentiated cells where Rb1 expression is deficient due to genetic ablation or RNAi-mediated knockdown, the cells are able to re-enter S phase of the cell cycle (Lin et al., 2011b; Sage et al., 2003). Cells derived from MEFs of different genotypes were subjected first to our differentiation protocol and then restimulated with serum. Cells were labeled with EdU as an indicator of S phase entry, histone 3 phosphorylated at serine 10 (H3pS10) as a marker of G2/M phase and for MyHC to identify differentiated cells (Figure 4C). As expected, while WT cells were EdU- and H3pnegative, Rb1-/- cells, including in the rare myotubes, were mitotic. Previously we, and others have established a connection between KDM5A localization at the promoter and repression of E2Ftarget genes, during terminal differentiation (LopezBigas et al., 2008a; Oevelen et al., 2008). The loss of Kdm5a-/- by itself did not result in a detectable cell cycle defect, suggesting that pRB acts in parallel and is able to compensate for the loss of KDM5A. Strikingly, the combined loss of Kdm5a and Rb1 caused an increase in cell cycle entry compared to Rb1 alone (Figure 4D), indicating on the high degree of cooperation between KDM5A and RB/E2F complexes. These data suggest that KDM5A may be working in cooperation with pocket protein complexes during differentiation to drive permanent exit from the cell cycle. Even deletion of a single copy of Kdm5a resulted in the phenotype associated with increased expression of myogenic markers and cell cycle reentry (Figure 4E). To confirm requirement of KDM5A in rescue defects in Rb1-/- cells, we expressed KDM5A in April 2012 264 Figure 2 Figure 2. KDM5A binding regions near the TSS and distal to the TSS have distinct features. (A) Venn diagram comparing DE genes, KDM5A target genes, and E2F1 target genes. Red circle indicates the number of differentially expressed genes (≥2 fold). Blue indicates the number of E2F1 target genes bound at ≤ 1kb from the TSS. Green indicates the number of KDM5A target genes bound at ≤ 1kb from the TSS. Overlapping areas indicate overlapping genes. (B) KDM5A binding regions near the TSS and distal to the TSS are enriched for distinct types of transcription factor binding sites. 249 genes are DE (≥2 fold) and bound by KDM5A at ≤ 1kb from the TSS. 776 TFBS were identified in the promoter regions (defined here as minus 600bp – plus 200bp of the TSS) of these genes. Analysis was performed on the list of 249 genes to determine which TFBS are enriched in these regions. Each row represents a TF for which binding sites are enriched. The degree of significance is indicated by color: highest significance (red) to least (orange). (C) 370 KDM5A binding peaks located ≥ 1.5kb from the TSS of 337 differentially expressed (≥2 fold) target genes. Z-score analysis was done to identify enriched TFBS located within 500 bp upstream and 500 bp downstream of each KDM5A peak summit. Enrichment was determined by comparing the TFBS in these regions to 1000 random regions of the same size located at best 10Kb away from each peak throughout the genome. Z-score is indicated by color. (D) Venn diagram shows DE/KDM5A target genes that have H3K4me3 at the promoter. Relatively smaller fraction has H3K4me1 at the promoter. Red circle is genes that are DE(≥4 fold) and KDM5A(≤1 kb from TSS) targets. Blue circle is the number of all DE(≥4 fold)/KDM5A target genes that have H3K4me3 in the promoter region +/- 2kb from TSS. Overlapping areas indicate overlapping genes. (E) Distal regions bound by KDM5A display enhancer marks. H3K27me3 data and the enhancer marks p300, H3K27ac, H3K4me1 and H3K4me3 data are analyzed for the KDM5A bound regions. Filled boxes indicate presence of the feature in close proximity to a KDM5A peak, open boxes indicate absence of the mark. Data for tissues were used from the ENCODE project. (F) A representative KDM5A bound regions with representation of enhancer marks. The mark is tissue specific, as H3K4me1 and DNA sensitivity patterns shows distinct association with this region in cerebellum, kidney and blood (from darker brown to light brown). 9. KDM5A REGULATION OF 265 DIFFERENTIATION Figure 3 Figure 3. Loss of KDM5A rescues low myogenin expression in a Rb1-/- background but results in its derepression. (A) Fluorescent microscopy of myogenin expression during the differentiation time course. WT, Kdm5a-/-, Rb1-/-, and Kdm5a-/-; Rb1-/- MEFs were induced to differentiate for 0, 6, 12, 18, 24, 36, and 48 hours. Cells were stained with anti-myogenin antibody and counter stained with the nuclear stain DAPI. (B) Quantitation of myogenin-positive cells shown specifically at the 12hour time point. In the case of multinucleated cells, if multiple nuclei in the same cell were positive for myogenin, the cell was counted only once. Error bars represent the standard error of the mean (SEM) of three replicates. Student’s t test was done to indicate the significant difference between samples at the 12-hour time-point. p-values are indicated. (C) For 0-48 hr time points myogenin-positive cells were quantified and are represented as the percentage of myogenin-positive cells out of the total cell number. 266 267 Figure 4 9. KDM5A REGULATION OF DIFFERENTIATION Figure 4. Loss of Kdm5a in a Rb1-/- background increases expression of late differentiation markers, apoptosis and cell cycle entry in cells induced for differentiation, which can be reversed by overexpression of KDM5A but not of the KDM5A demethylation mutant. (A) TUNEL assays. WT, Rb1-/-, and Kdm5a-/-; Rb1-/- MEFs were induced to differentiate. TUNEL assays were performed along with immunofluorescent staining for MyHC, a late marker of myogenic differentiation, and nuclear DAPI staining. Representative image of the cells at day 6 are shown. (B) Quantification of TUNEL positive cells for each genotype. TUNEL assays were performed for 0, 2, 4, and 6 days as differentiation time points. Data is shown as percent TUNEL-positive cells out of total cell number. In the case of multinucleated cells, if multiple nuclei in the same cell were positive for TUNEL, the cell was counted only once. Error bars represent the standard error of the mean (SEM) of three replicates. (C) Assays for cell cycle reentry. WT, Kdm5a-/-, Rb1-/-, and Kdm5a-/-; Rb1-/MEFs were induced to differentiate for 3 days in media containing 2% horse serum. Cells were then restimulated in high-serum media (20% FBS) for 12 hours. After 12 hours EdU was added to the media and cells were fixed after an additional 12 hours. EdU incorporation indicates S-phase entry; histone 3 serine 10 phosphorylation (H3S10p) is a marker of G2/M phase; these stains are used in conjunction with MyHC and DAPI stain. (D) Quantitation of differentiated cells, marked by MyHC, that were positive for cell cycle reentry, marked by EdU or H3S10p, for each of the indicated genotypes. In the case of multinucleated cells, if multiple nuclei in the same cell were positive for EdU or H3S10p the cell was counted only once. Data is shown as percentage of EdU-positive myotubes and H3S10p-positive myotubes, out of the total cell number. Error bars represent the standard error of the mean (SEM) of three replicates. (E) Loss of even a single copy of Kdm5a rescues expression of myogenic genes but unable to rescue persistence of cells in cell cycle and sensitivity to apoptosis. Stainings for MyHC, MyoG as well as for EgU and H3S10p are shown. (F) Rescue experiment in Kdm5a-/-; Rb1-/- MEFs for the restoration of MyHC expression. MEFs were infected with lentiviruses encoding either histone demethyation mutant of KDM5A or unmodified KDM5A, induced for differentiation and stained with MyHC and DAPI. (G) Quantification of MyHC positive cells in rescue experiments. DKO cells and analyzed their propensity to differentiation. The cells with reintroduced KDM5A lost their ability to differentiate and became similar to Rb-/- cells (Figure 4F). Given the link of KDM5A function to demethylation, we asked if the KDM5A catalytic domain mutant would inhibit cellular differentiation similar to pRB. We found that cells expressing the KDM5A mutant expressed MyHC (Figure 4F and G) and were multinucleated, similar to the original DKO cells. These experiments directly linked the inhibitory KDM5A function in differentiation to its demethylase activity. To confirm requirement of lack of KDM5A in rescue defects in Rb1-/- cells, we expressed KDM5A in DKO cells and analyzed their propensity to differentiation. The cells with reintroduced KDM5A lost their ability to differentiate and became similar to Rb1-/- cells (Figure 4F). Given the link of KDM5A function to demethylation, we asked if the KDM5A catalytic domain mutant would inhibit cellular differentiation similar to pRB. We found that cells expressing the KDM5A mutant expressed MyHC (Figure 4F and G) and were multinucleated, similar to the original DKO cells. These experiments directly linked the inhibitory KDM5A function in differentiation to its demethylase activity. Because pRB is a general regulator of differentiation and it represses KDM5A, we asked if their combined loss results in activation of genes belonging to different lineages. Surprisingly, irrespective of the induction condition in vitro, the Kdm5a loss resulted in expression of numerous differentiation markers in Rb1-/- cells. Single-cell level analysis showed simultaneous expression of both MyHC and the neuronal marker DLX2 in myotubes (Figure 5). The DLX2 expression was detected in multiple nuclei of myotubes, suggesting that these are not random fusion events between developing myoblasts and neural cells in culture (Figure 5). The multilineage potential of the induced myoblasts upon KDM5A loss suggests about the KDM5A function not only in proliferation but also in lineage specification. We conclude that KDM5A has two separate categories of target genes during differentiation each with distinct means of regulation by KDM5A. Cell cycle genes are regulated by KDM5A from the promoter region while developmental genes are bound at distal regulatory regions many of which possess characteristics of enhancers. These results are consistent with the model that KDM5A demethylates H3K4 at the enhancers of cell-type specific genes during differentiation and this activity is essential to fully repress these genes. Enhancers of active genes contain nucleosomes with H3K4me1 and H3K4me3. If KDM5-dependent H3K4me1/me3 demethylase activity is involved in enhancer silencing during cell lineage decisions, then lack of KDM5A would cause the retention of H3K4me1/me3 levels at dependent genes. Increased levels of H3K4me1, p300 and H3K27ac were observed at the enhancer of the RCAN1 gene and other KDM5A regions in specific tissues. In order to assess to which extend differentiation is rescued due to the Kdm5a loss in the cells that otherwise unable to differentiate, we analyzed myogenic markers in Kdm5a-/-Rb1-/- cells. We found that loss of Kdm5a fully rescues expression of early markers and at least partially rescues expression of late markers of 268 April 2012 Materials and Methods Cell Culture, differentiation and myotube isolation. Mouse embryonic fibroblasts (MEFs) were generated as described previously(Lin et al., 2011a). MEFs were maintained in DMEM (CellGro) containing 10% FBS (HyClone). MEFs were seeded on 6-well dishes coated with fibronectin at 2.2 x 105 cells/well. The following day the cells were induced to differentiate by transduction with Adeno-MyoD (2.2 x 108 PFU) and switching to differentiation media (DM): DMEM (CellGro), 2% horse serum (Gibco), 10 µg/mL insulin (Sigma) and incubated overnight. After 24 hours, the DM containing AdMyoD was replaced with fresh DM. At 72 hours myotubes were purified as follows. The media was removed from the cells and kept aside. Cells were trypsinized, resuspended and combined in 2 mL of growth media, DMEM and 10% FBS, into one well. The plate was left to sit undisturbed for 1 minute. 9. KDM5A REGULATION OF differentiation. Kdm5a null myotubes are larger and more multinucleated than Wt. Later during differentiation KDM5A binds to promoters of genes targeted by pRB and E2F. Loss of Kdm5a in a Rb1 null background enhances cell cycle reentry of terminally differentiated myotubes. Our data suggest that loss of Rbp2 suppresses growth of Rb1-/tumors by promoting a more differentiated state despite an exacerbation of cell cycle reentry. Additionally, in mice, loss of Kdm5a suppresses tumorigenesis caused by deletion of Rb1. A global analysis confirmed that genes associated with KDM5A-occupied enhancers are derepressed in differentiating Kdm5a null cells. When Kdm5a-/cells are induced to differentiate down the myogenic lineage, this includes enhanced expression of not only myogenic markers but also neuronal markers. As the neuronal differentiation can be induced in cells lacking Kdm5a function, these results indicate that multiple programs fail to be silenced in these cells. Differentiation of ESCs includes regulation of HOX genes, which become de-repressed upon differentiation. De-repression of HOX genes also occur in a KDM5A-deficient background (Boyer et al., 2006; Christensen et al., 2007). In contrast, the persistent expression of HOX genes by MLL1 or KDM5A fusion oncoproteins induces leukemic transformation (Hess, 2004). HOXA9 is normally expressed in haematopoietic stem/progenitor cells, while it is repressed as differentiation progresses. However, HOXA9 is continuously expressed in MLL1- or KDM5A-rearranged human leukemias. It is imperative then that the methylated state of H3K4 at promoters and enhancers determines the level of gene activity. Our data show that it is critical for activation of transcription program associated with different cellular functions, including cell cycle and differentiation, through the direct binding and demethylation by KDM5A. When in suspension myotubes lose their tube-shape morphology and adopt a more spherical shape. Multinucleated myotubes are larger than undifferentiated MEFs. This property can be used to distinguish them from the undifferentiated cells. Heavier myotubes sank to the bottom of the tissue culture dish faster than the undifferentiated cells. Then ¾ (1.5 mL) of the media containing mostly undifferentiated MEFs was carefully removed by pipette leaving differentiated myotubes on the bottom of the well. 1.5 mL of growth media was added back to the well and the cells were resuspended. This processes was repeated 2 more times to generate a purified population of myotubes. Cells were allowed to recover and reattach to the dish for 3 hours in growth media. The media originally removed from the well was now being used as conditioned media. In particular, after 3 hours the growth media was removed and replaced with the conditioned media that was set aside. Cells were then allowed to recover for 24 hours. Cell cycle analysis, TUNEL and immunostaining. MEFs were induced to differentiate as described previously(Lin et al., 2011a). Detection of myogenin was done with anti-myogenin antibody (Thermo Scientific, Clone F5D) at a 1:50 dilution. Cy3 labeled anti-mouse was used as the secondary antibody. Nuclei were stained with DAPI. Quantitiation of myogenin positive cells was performed by counting 4 microscope fields (40X power) per well, in triplicate wells, and determining the percent that were stained positive for myogenin. In the case of multinucleated cells, if multiple nuclei in the same cell were positive for myogenin, the cell was counted only once. For cell cycle analysis, on the 3d day the cells were removed from differentiation media and put into DMEM/20% FBS for 12 hours to promote cell cycle reentry. After 12 hours EdU was added at 10µM concentration to the culture media and cells were incubated for an additional 12 hours. Cells were then fixed for 15 minutes in 10% formalin followed by three washes with PBS. Immunofluorescent staining was done with anit-MYHC and anti-H3S10p. Staining for EdU was done according to the manufacturer’s protocol using reagents included in the EdU kit. Images were taken with a Zeiss Confocal microscope. The experiment was performed in triplicate. Cells were counted from three fields in each well (3 fields/well x 3 wells) for each genotype for a total of 9 fields per genotype. TUNEL staining was performed with the In Situ Cell Death Detection Kit, Fluorescein made by Roche according to the manufacturer’s protocol. Quantification was done by counting the number of TUNEL-positive cells and determining the percentage that were TUNEL-positive out of the total cell number. Assays were done in triplicate. Images were taken using a 10x objective. >100 cells per replicate were counted and averaged. In the case of multinucleated cells, if multiple nuclei in the same April 2012 269 DIFFERENTIATION Figure 5 Figure 5. Loss of Kdm5a in an Rb1-/- background results in multi-lineage differentiation. MEFs induced for myogenic differentiation were stained for the neuronal stem cell marker DLX2 along with MyHC staining. The multimucleation and myotube formation in DKO correlates not only with MyHC expression but also with DLX2 expression. 270 KDM5A regulates differentiation cell were positive for TUNEL, the cell was counted only once. RNA-seq library preparation. 8 wells for MEFs of each genotype were seeded. RNA from purified wild type and double knockout myotubes was extracted with 400 µl Trizol (invitrogen) but not precipitated. The aqueous phase containing the RNA was removed and one volume of 70% ethanol was added followed by purification on Qiagen RNeasy column according to the Qiagen protocol. 1.5 µg of RNA from each sample was purified by two rounds of oligo-dT purification with Dynal oligo-dT beads (Invitrogen). The purified RNA was fragmented for 5 o minutes at 94 C using 10X fragmentation buffer (NEB). First strand cDNA synthesis was performed with random primers (NEB) and Superscript II reverse transcriptase (Invitrogen). Second strand synthesis was done with 10X Second Strand Synthesis Reaction Buffer (NEB) and Second Strand Synthesis Enzyme Mix (NEB). Double stranded cDNA was purified with 1.8 volumes of AMPure XP Beads (Beckman Coulter) followed by end repair using NEBNext End Repair Enzyme Mix and Reaction Buffer (NEB). End-repaired DNA was cleaned up with 1.8 volumes of AMPure XP Beads followed by dA-Tailing using 10X NEBNext dATailing Reaction Buffer and Klenow Fragment(3’>>>5’ exo-). dA-Tailed DNA was cleaned up with 1.6 volumes of AMPure XP Beads followed by adapter ligation using PE adapters designed for Illumina sequencing. Adapter-ligated DNA was cleaned up and size selected for a range of 250-300 bp using 1 volume of AMPure XP Beads. cDNA was enriched by 15 cycles of PCR using Phusion DNA Polymerase (NEB) and PE primers designed for Illumina sequencing. Enriched DNA was cleaned up with 1 volume AMPure XP Beads. Single-read sequencing for 36 bases was done on an Illumina Genome Analyzer II. RNA-seq data analysis (i) Read quality assessment and filtering. Basic assessment of Illumina output reads (FastQ) quality including GC bias were checked by FastQC program (http://www.bioinformatics.bbsrc.ac.uk/projects/fastq c/) (data not shown). Poor quality reads were eliminated before mapping based on default quality flag by Illumina pipeline in FastQ file. (ii) Mapping of reads. Mapping of reads was done with TopHat (version 1.2.0) (Trapnell et al., 2009). Identified 36bp short reads were uniquely aligned allowing at best two mismatches to the UCSC (Lander et al., 2001) mouse reference genome (mm9). Sequence matched exactly more than one place with equally quality were discarded to avoid bias. The reads that were not mapped to the genome were utilized to map against the transcriptome (junctions mapping). EnsEMBL 59 (Hubbard et al., 2007) gene model was used for this process. Any residual mapping bias was checked using RSEQtools (Habegger et al., 2010) (data not shown). Consistency of two replicates data on KO and WT were checked by Pearson correlation coefficient by counting reads in 500bp bins. PCC was over 0.98 in both cases (data not shown). (iii) Differential expression analysis. After mapping, we used BedTools (version 2.10) (Quinlan and Hall, 2010) to calculate read abundance (read count, rc) for each transcript associated to EnsEMBL (v59) genes. Since Illumina uses random hexamer priming which could induce biases in the nucleotide composition at the beginning of transcriptome, we utilized read count reweighting scheme (Hansen et al 2010)? to calculate weighted count (wc) using Bioconductor(Gentleman et al., 2004) package Genominator (version 1.2.4). WC data were used for differential expression (DE) analysis (DKO vs. WT) by popular bioconductor package DESeq (version 1.2.1) (Anders and Huber, 2010) that uses a model based on the negative binomial distribution. We normalized the data based on total mapped reads in four samples and using inherited functions, we checked fitting of the model (Figure S1) and also produced MA plot for overall visualization of differential expression pattern. However, as a proof of our approach, DE analysis was also done using another popular method in R package called DEGseq (MARS algorithm) (Wang et al.) (data not shown). In both methods we found a very high overlap (86.4% genes of DEseq identified as DE by DEGseq) of DE genes and finally used results from DEseq algorithm for further analysis. Functional and pathway enrichment analysis. Functional annotation of target genes is based on Gene Ontology (GO) (Lander et al., 2001) (http://www.geneontology.org) as extracted from EnsEMBL (Hubbard et al., 2007) and KEGG pathway database (Kanehisa et al., 2008). Accordingly, all genes are classified into ontology: genes involved in Biological Process (BP). We used R package GOseq (Young et al.) for enrichment analysis that was specially developed for studying RNAseq data and considers the selection bias effects inherent to this platform. Resulting p-values were adjusted for multiple testing using the Benjamin and Hochberg's method of False Discovery Rate (FDR) (Benjamini and Hochberg, 1995) and heatmap was generated in GiTools (Perez-Llamas and Lopez-Bigas) using corrected pvalues. Enrichment of transcription factor binding. Possible occurrence of transcription factor (TF) motif in the promoter regions (600 bp upstream with 200 bp downstream with respect to TSS) of differentially expressed KDM5A targets were predicted using STORM algorithm (Schones et al., 2007) with a pApril 2012 9. KDM5A REGULATION OF 271 DIFFERENTIATION value cutoff 0.0000125 and using position frequency matrices (PFM) from TRANSFAC database (professional version release 2009.4) (Matys et al., 2003). Using binomial statistics with GiTools (PerezLlamas and Lopez-Bigas, 2011) we identified enriched TFs binding in comparison of TF occurrence in the non-differentially expressed gene promoters. FDR corrected p-values (Benjamini and Hochberg, 1995) were used for heatmap representation of enriched TFs. Possible occurrence of TF motifs in differentially expressed KDM5A peak region (1000 bp around peak summit) was predicted with STORM algorithm (Schones et al., 2007) with a p-value cutoff 0.00001. We calculated random occurrence of TF by scanning 1000 random region of equal size for each enriched peak in KDM5A ChIPseq data, within maxim of 10 kb window around each input KDM5A peak region. However, we excluded having any random peak in UCSC 'gap' region. Then we calculated Z-score from TF motif found in ChIP region and average number of motif found in 1000 random region and its standard deviation. We calculated two tail p-value for each Z-score and finally adjusted (FDR) (Benjamini and Hochberg, 1995) p-value was taken in consideration for statistical significance. A heatmap of positive Zscore having significant p-value was generated using GiTools (Perez-Llamas and Lopez-Bigas, 2011). Overlap analysis. Overlap of genomic position range data was done using BedTools (Quinlan and Hall, 2010) and significance of overlap was analyzed by Hypergeometric test. However, venn-diagram generation and overlap of target genes analysis was performed using in-house R programming language script. Significance of overlap calculated based on Chi-square test. Unless stated, peak overlap defines as overlap by at least one bp. Public ChIP-seq data analysis. In the present study we have utilized several publicly available genome-wide data (Table S3) as well as our ChIPseq data on mouse ES cells against KDM5A (our submitted manuscript). ChIP-seq data (except ChIPseq data on differential MyoD binding in skeletal muscle (Cao et al.), which we used author analyzed data) were processed by us from raw source data. Short reads were uniquely aligned allowing at best two mismatches to the UCSC (Lander et al., 2001) mouse reference genome (mm9) using the program BOWTIE (Langmead et al., 2009). Peak caller algorithm MACS (version 1.3.7.1) (Zhang et al., 2008) was used to determine enriched peak region. Enriched peaks were annotated to the nearest EnsEMBL (Hubbard et al., 2007) gene using the Bioconductor package ChIPpeakAnno (Zhu et al., 2010). For Chip-on-Chip data analysis, we used the program Chipper (Gibbons et al., 2005) with FDR cut off at least 0.05. References 1. Anders, S., and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol 11, R106. 2. Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837. 3. Benevolenskaya, E.V., Murray, H.L., Branton, P., Young, R.A., and Kaelin, W.G. (2005a). Binding of pRB to the PHD protein RBP2 promotes cellular differentiation. Molecular Cell 18, 623-635. 4. Benevolenskaya, E.V., Murray, H.L., Branton, P., Young, R.A., and Kaelin, W.G., Jr. (2005b). Binding of pRB to the PHD protein RBP2 promotes cellular differentiation. Mol Cell 18, 623-635. 5. Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57, 289-300. 6. Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315-326. 7. Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine, S.S., Wernig, M., Tajonar, A., Ray, M.K., et al. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349-353. 8. Cao, Y., Yao, Z., Sarkar, D., Lawrence, M., Sanchez, G.J., Parker, M.H., MacQuarrie, K.L., Davison, J., Morgan, M.T., Ruzzo, W.L., et al. Genome-wide MyoD binding in skeletal muscle cells: a potential for broad cellular reprogramming. Dev Cell 18, 662-674. 9. Charlotte A. Berkes, S.J.T. (2005). MyoD and the transcriptional control of myogenesis. 1-11. 10. Christensen, J., Agger, K., Cloos, P.A., Pasini, D., Rose, S., Sennels, L., Rappsilber, J., Hansen, K.H., Salcini, A.E., and Helin, K. (2007). RBP2 belongs to a family of demethylases, specific for tri-and dimethylated lysine 4 on histone 3. Cell 128, 10631076. 11. Foster, C.T., Dovey, O.M., Lezina, L., Luo, J.L., Gant, T.W., Barlev, N., Bradley, A., and Cowley, S.M. (2010). Lysine-specific demethylase 1 regulates the embryonic transcriptome and CoREST stability. Mol Cell Biol 30, 4851-4863. 12. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80. 13. Ghisletti, S., Barozzi, I., Mietton, F., Polletti, S., De Santa, F., Venturini, E., Gregory, L., Lonie, L., Chew, A., Wei, C.L., et al. (2010). Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. Immunity 32, 317-328. 14. Gibbons, F.D., Proft, M., Struhl, K., and Roth, F.P. (2005). Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol 6, R96. 15. Habegger, L., Sboner, A., Gianoulis, T.A., Rozowsky, J., Agarwal, A., Snyder, M., and Gerstein, M. (2010). RSEQtools: a modular framework to analyze RNA-Seq April 2012 272 KDM5A regulates differentiation data using compact, anonymized data summaries. Bioinformatics 27, 281-283. 16. Hess, J.L. (2004). MLL: a histone methyltransferase disrupted in leukemia. Trends Mol Med 10, 500-507. 17. Hubbard, T.J., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., et al. (2007). Ensembl 2007. Nucleic Acids Res 35, D610-617. 18. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Res 36, D480-484. 19. Klose, R.J., Yan, Q., Tothova, Z., Yamane, K., Erdjument-Bromage, H., Tempst, P., Gilliland, D.G., Zhang, Y., and Kaelin, W.G., Jr. (2007). The retinoblastoma binding protein RBP2 is an H3K4 demethylase. Cell 128, 889-900. 20. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. 21. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25. 22. Lin, W., Cao, J., Liu, J., Beshiri, M.L., Fujiwara, Y., Francis, J., Cherniack, A.D., Geisen, C., Blair, L.P., Zou, M.R., et al. (2011a). Loss of the retinoblastoma binding protein 2 (RBP2) histone demethylase suppresses tumorigenesis in mice lacking Rb1 or Men1. Proceedings of the National Academy of Sciences of the United States of America 108, 1337913386. 23. Lin, W., Cao, J., Liu, J., Beshiri, M.L., Fujiwara, Y., Francis, J., Cherniack, A.D., Geisen, C., Blair, L.P., Zou, M.R., et al. (2011b). Inaugural Article: Loss of the retinoblastoma binding protein 2 (RBP2) histone demethylase suppresses tumorigenesis in mice lacking Rb1 or Men1. Proceedings of the National Academy of Sciences, 1-8. 24. Lopez-Bigas, N., Kisiel, T.A., Dewaal, D.C., Holmes, K.B., Volkert, T.L., Gupta, S., Love, J., Murray, H.L., Young, R.A., and Benevolenskaya, E.V. (2008a). Genome-wide analysis of the H3K4 histone demethylase RBP2 reveals a transcriptional program controlling differentiation. Molecular Cell 31, 520-530. 25. Lopez-Bigas, N., Kisiel, T.A., Dewaal, D.C., Holmes, K.B., Volkert, T.L., Gupta, S., Love, J., Murray, H.L., Young, R.A., and Benevolenskaya, E.V. (2008b). Genome-wide analysis of the H3K4 histone demethylase RBP2 reveals a transcriptional program controlling differentiation. Mol Cell 31, 520-530. 26. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., et al. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 31, 374-378. 27. Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560. 28. Novitch, B.G., Mulligan, G.J., Jacks, T., and Lassar, A.B. (1996). Skeletal Muscle Cells Lacking the Retinoblastoma 29. Protein Display Defects in Muscle Gene Expression 9. KDM5A REGULATION OF April 2012 273 DIFFERENTIATION 30. and Accumulate in S and G 2 Phases of the Cell Cycle. The Journal of Cell Biology 135, 441-456. 31. Oevelen, C.v., Wang, J., Asp, P., Yan, Q., Jr, W.G.K., Kluger, Y., and Dynlacht, B.D. (2008). A Role for Mammalian Sin3 in Permanent Gene Silencing. Molecular Cell 32, 359-370. 32. Perez-Llamas, C., and Lopez-Bigas, N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 6, e19541. 33. Perez-Llamas, C., and Lopez-Bigas, N. (2011). Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 6, e19541. 34. Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842. 35. Sage, J., Miller, A.L., rez-Mancera, P.A.P., and Jacks, J.M.W.T. (2003). Acute mutation of retinoblastoma gene function is sufficient for cell cycle re-entry. Nature 424, 223-228. 36. Schones, D.E., Smith, A.D., and Zhang, M.Q. (2007). Statistical significance of cis-regulatory modules. BMC Bioinformatics 8, 19. 37. Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111. 38. Wang, J., Hevi, S., Kurash, J.K., Lei, H., Gay, F., Bajko, J., Su, H., Sun, W., Chang, H., Xu, G., et al. (2009). The lysine demethylase LSD1 (KDM1) is required for maintenance of global DNA methylation. Nat Genet 41, 125-129. 39. Wang, J., Scully, K., Zhu, X., Cai, L., Zhang, J., Prefontaine, G.G., Krones, A., Ohgi, K.A., Zhu, P., Garcia-Bassets, I., et al. (2007). Opposing LSD1 complexes function in developmental gene activation and repression programmes. Nature 446, 882-887. 40. Wang, L., Feng, Z., Wang, X., and Zhang, X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26, 136-138. 41. Young, M.D., Wakefield, M.J., Smyth, G.K., and Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11, R14. 42. Young, M.D., Wakefield, M.J., Smyth, G.K., and Oshlack, A. (2010). Gene ontology analysis for RNAseq: accounting for selection bias. Genome Biol 11, R14. 43. Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137. 44. Zhu, L.J., Gazin, C., Lawson, N.D., Pages, H., Lin, S.M., Lapointe, D.S., and Green, M.R. (2010). ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11, 237. Supplementary Tables: Table S1: Total reads and mapped reads and # of genes with at least one mapped read: Two biological replicates for each condition of knockout (KO) and wild type (WT) Samples Total Reads Mapped Reads % Read mapped Total Genes Mapped S1 (KO) 12948758 9855293 76.11 20,648 S2 (KO) 21837581 17513218 80.2 20,884 S3 (WT) 14512226 10848471 74.75 20,193 S4 (WT) 21139208 15953682 75.47 20,907 Table S2: Differential expression (DE) analysis, DEseq output in Excel file. Table S3: Public genome-wide data used in this study Data Type ChIPseq Molecule/Antibody H3K4me3 Organism/Cell Line/Cells MEF tranduced with control pCLBABE retrovirus MEF tranduced with pCLBABE-MyoD retrovirus Mouse C2C12 cells Undifferentiated mouse ES (E14) Mouse NIH 3T3 Mouse ES Mouse ES Mouse ES Reference [1] ChIPseq ChIP-onChip ChIPseq ChIPseq ChIPseq MyoD E2f1 E2f4 H3K27me3 H3K36me3 Pol-II H3K4me1 H3K27ac p300 KDM5A [2] [3] [4] [5] GEO: GSE28343 [1]. Cao Y, Yao Z, Sarkar D, Lawrence M, Sanchez GJ, Parker MH, MacQuarrie KL, Davison J, Morgan MT, Ruzzo WL, Gentleman RC, Tapscott SJ. Genome-wide MyoD binding in skeletal muscle cells: a potential for broad cellular reprogramming. Dev Cell. 2010 Apr 20;18(4):662-74. [2]. Chen X, Xu H, Yuan P, Fang F et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 2008 Jun 13;133(6):1106-17 [3]. Rabinovich A, Jin VX, Rabinovich R, Xu X et al. E2F in vivo binding specificity: comparison of consensus versus nonconsensus binding sites. Genome Res 2008 Nov;18(11):1763-77. [4]. Goldberg AD, Banaszynski LA, Noh KM, Lewis PW et al. Distinct factors control histone variant H3.3 localization at specific genomic regions. Cell 2010 Mar 5;140(5):678-91. [5]. Creyghton MP, Cheng AW, Welstead GG, Kooistra T et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 2010 Dec 14;107(50):21931-6. 274 Table S4. KDM5A targets that are DE and have the GO term for “Muscle Cell Differentiation”. Sixty six genes are associated with the GO term “muscle cell differentiation” (GO:0042692). 6 of these 66 are KDM5A targets in mES cells. 4 of these 6 are also DE. Ensemble id ENSMUSG00000026836 ENSMUSG00000026812 ENSMUSG00000021217 ENSMUSG00000018604 chromosome 2 2 7 5 Fold change 3.47 0.69 5.26 0.07 Log2 Fold change 1.79 -0.53 2.4 -3.76 p-value 3.42E-026 0 4.42E-045 6.01E-019 adjusted p-value 4.34E-025 0.01 1.08E-043 5.52E-018 gene Acvr1 Tsc1 Tshz3 Tbx3 Table S5. TFBS in promoter of KDM5A targets (DE ≥4 fold and ≤1kb from the TSS) that are associated with GO terms for development Gene (n=11) Cenpe Cdc20 Trip13 Tyms Brca2 Birc5 Aspm Rttn Ankrd54 Mre11a Dact1 Fold Change 739.6776745 104.4603086 53.47290343 29.75258545 24.20265991 23.58009116 21.03839114 8.282567196 4.517085834 4.354349088 0.217337694 # of TF that bind 7 1 1 5 9 5 8 7 7 4 3 % of TF that bind 50.0 7.1 7.1 35.7 64.3 35.7 57.1 50.0 50.0 28.6 21.4 TF (n=14) ** FOX IRF NFY PAX SP1 CREB HNF ALPHACP1 ATF CAAT GATA HOX VJUN XFD # of genes bound 6 5 5 5 5 4 4 3 3 3 3 3 3 3 % of genes bound 54.5 45.5 45.5 45.5 45.5 36.4 36.4 27.3 27.3 27.3 27.3 27.3 27.3 27.3 Most (8 of 11) of these genes are related to mitosis and/or DNA damage repair but not development. Two genes (Rttn, Dact1) that appear to have more direct developmental functions: (i) RTTN: Required for the early developmental processes of left-right (L-R) specification and axial rotation and may play a role in notochord development (Faisst et al., 2002). Involved in the genetic cascade that governs left-right specification. Required for correct asymmetric expression of NODAL, LEFTY and PITX2. (ii) DACT1: Positively regulates DVL2-mediated signaling pathways during development. Binds to DVL2 and impedes the degradation of CTNNB1/beta-catenin, thereby enhancing the transcriptional activation of target genes of the Wnt signaling pathway. (Ref: GeneCards, http://www.genecards.org/). Ankrd54 may involve in muscle differentiation or muscle physiology based on information for other ankryin repeat domain proteins, but there is limited information about this specific gene. FOX family transcription factors are the most represented. Additionally HNF transcription factors are part of the FOX family, and HNF3 is alternatively known as FOXA. The XFD transcription factors are Forkhead Domain orthologs in Xenopous. Reference: Faisst AM, Alvarez-Bolado G, Treichel D, Gruss P. Rotatin is a novel gene required for axial rotation and left-right specification in mouse embryos. Mech Dev. 2002 ,13(1):15-28. 9. KDM5A REGULATION OF ** There are 11 KDM5A targets with DE ≥4 fold, located ≤1kb from the TSS and are associated with GO terms for development (first column). TFBS analysis predicted that in the promoter of these genes, there are binding sites for 163 different TFs. Selecting only the TFs that has binding motif on at least 3 of the 11 genes left 14 TFs (including paralogs). 275 DIFFERENTIATION Supplementary Figures: Figure S1 (a): Diagnostic plot to check the fit of the variance function. Plot showing per-gene estimates of the base variance against the base mean levels, and red line showing fit from the local regression. We see the fit (red line) follows the single-gene estimates well, even though the spread of the latter is considerable given that only two replicates was as input. Figure S1(b): Density of residual variance ratios: Density of the ratio from residual variance quotients in KO and WT (see Supplementary excel file Table S2) (i.e. the ratio of the variance as calculated only from the counts for the gene under consideration to the fitted variance) are shown in the graph. The first two lines estimate the density of the quotients for conditions KO and WT, and plot them in red and blue. Figure shows that the samples density agree with theoretical density model based on a chi-square distribution with 1 degree of freedom (we have two replicates for each condition, and the number of degrees of freedom is one less than the number of replicates). The third line adds the theoretical density function in grey. 276 Figure S1(c): MA plot for the contrast KO vs. WT. Plot of the Log2 fold changes against the base means, coloring in red those genes that are significant at 5% FDR. 9. KDM5A REGULATION OF Figure S2. Common target genes of KDM5A and MYOD. Overlap analysis between KDM5A target genes and MYOD target genes, where KDM5A and MYOD bind the targets at >1kb from the TSS. (Upper) Total 337 KDM5A targets (DE ≥ 2 fold, and ≥1 kb from TSS) were compared with total 2997 MYOD targets (≥1 kb from TSS) that are differentially bound (myotube/myoblast) by MYOD during myogenesis. Among these, 92 genes are targeted by both KDM5A and MYOD (27% of KDM5A targets). 61 (66.3%) genes out of 92 overlapping genes are upregulated in Kdm5a-/-;Rb1-/- compared to WT, whereas 31 (33.6%) of 92 are downregulated. 1878 genes out of total 2997 differentially-bound MYOD targets are increasingly-bound by MYOD in myotubes compared to myoblasts. Of these 1878 genes, 51 were KDM5A targets (DE ≥ 2 fold, and ≥1 kb from TSS). Of these 51 genes, 33(64.7%) are upregulated and 18 (35.3%) are downregulated in Kdm5a-/-;Rb1-/- vs WT myotubes. 1119 out of the 2997 277 DIFFERENTIATION differentially-bound MYOD targets are decreasingly-bound by MYOD in myotubes compared to myoblasts. 41 genes out of this 1119 are KDM5A targets (DE ≥ 2 fold, and ≥1 kb from TSS). Of these, 28 (68.2%) are upregulated and 13 (31.7%) are downregulated in Kdm5a-/-;Rb1-/- vs WT myotubes. There is no significant correlation was observed between increased or decreased MYOD binding in myotubes and differential expression in Kdm5a-/-;Rb1-/- myotubes compared to WT. 65% of KDM5A targets (DE ≥ 2 fold, and ≥1 kb from TSS) are upregulated and 35% are downregulated in Kdm5a-/-;Rb1-/- vs WT myotubes. This pattern does not change when we look at these KDM5A targets that are also MYOD targets: 66.3% upregulated and 33.6% downregulated. It also remains the same when we look at the KDM5A targets that show increased MYOD binding or decreased MYOD binding in myotubes vs myoblasts. 64.7% of the genes that are KDM5A targets and MYOD (increased binding) targets are upregulated and 35.3% are downregulated in Rbp2-/-;Rb1-/- vs WT. 68.2% of the genes that are KDM5A targets and MYOD (decreased binding) targets are upregulated and 31.7% are downregulated. This is true even though the genes that showed increased MYOD binding in C2C12 myotubes tended to be upregulated in myotubes vs myoblasts and the genes that showed decreased binging tended to be downregulated. Figure S3. Distance Between MYOD binding peaks and KDM5A binding peaks associated with the same genes. Compared distances between MYOD peaks and KDM5A peaks associated with the same genes (150 pairs of peaks at 92 genes). Determined the distance between the center of two given peaks. Used KDM5A (>1kb) targets that are DE (≥2 fold) and MYOD peaks that are >1kb from TSS. Each slice of pie represents the number of peak pairs (MYOD:KDM5A) that lie within a given range of distance between each other. 278 
 9. KDM5A REGULATION OF Figure S4. Enrichment of KEGG pathways among DE genes. DE genes (cut off 4 fold, p-value 0.001, FDR 1%) in comparison of Kdm5a-/-;Rb1-/- vs WT Kdm5a-/-;Rb1-/- myotubes vs WT (total 2650 genes) were used for pathway enrichment analysis. Up-regulated genes are in left column, and down-regulated genes are in right column. 279 DIFFERENTIATION 
 Figure S5a. Enrichment of gene onGOBP) among DE genes. Enriched GOBP terms are present in rows. All DE genes (cut off ≥2 fold) (left column); DE genes with lower expression in Kdm5a-/-;Rb1-/- vs WT myotubes (right column); DE genes with higher expression in Kdm5a-/-;Rb1-/- vs WT myotubes (center column). Showing only the top 43 most enriched terms. 280 
 9. KDM5A REGULATION OF 
 Figure S5b. Enrichment of gene ontology biological process (GOBP) among DE genes (cut off ≥4 fold) that are KDM5A targets. All DE genes and KDM5A targets (left column); higher expressed DE genes and KDM5A targets in Kdm5a-/-;Rb1-/- vs WT myotubes (center column); lower expressed DE genes and KDM5A targets in Kdm5a-/-;Rb1-/- vs WT myotubes (right column). 281 DIFFERENTIATION A. B. C. D. Figure S6. Venn diagrams for genes with TFBS for TFs whose binding sites were found to be highly enriched at promoters of differentially expressed and KDM5A targets (bound by KDM5A ≤1kb from TSS). A. Overlap of genes with binding sites for 3 TFs within the AP-1 family: CREB, Jun, ATF. B. Overlap of genes with binding sites for 3 TFs within ETS family: GABP, SAP1A, ELK1. C. Overlap of genes with binding sites for the combined TFs from A. and B. compared with genes with binding sites for E2F transcription factors. D. Overlap of genes with binding sites for the combined TFs from A. and B. plus binding sites for the SP1 transcription factor compared with genes with binding sites for E2F transcription factors. A B C D 
 
 
 
 Figure S7. Venn diagrams for genes with overlap between KDM5A peaks and marks of enhancer regions. A. Genes that are DE (≥2 fold) and KDM5A targets (located >1kb from TSS) compared with all genes that have overlapping binding sites of KDM5A and H3K4me1. B. Differentially expressed KDM5A targets (DE ≥2 fold and peaks >1kb from TSS) associated to enriched ‘developmental’ related gene ontology terms were compared to all genes that have overlapping binding sites of KDM5A and H3K4me1. C. Differentially expressed KDM5A targets (DE ≥2 fold and peaks >1kb from TSS) compare to all KDM5A target genes where the conservation score of the DNA sequence bound by KDM5A is ≥ 0.5. D. Comparison of the common genes found in A. and C. with differentially expressed KDM5A targets (DE ≥2 fold and peaks >1kb from TSS) associated with enriched ‘developmental’ related gene ontology terms. Figure S8. Venn diagram comparing DE (≥2 fold) genes, E2F4 target genes and KDM5A (≤1kb from TSS) target genes. 282 Figure S9. Venn diagram comparing DE (≥4 fold) genes to KDM5A target genes. Figure S10. Venn diagram comparing DE (≥4 fold) and KDM5A target genes to all KDM5A target genes that are also H3K4me3 targets. 9. KDM5A REGULATION OF 283 DIFFERENTIATION 
 Figure S11. Enrichment of gene ontology biological process (GOBP) of DE genes (cut off ≥2 fold) that are also KDM5A targets. DE and KDM5A targets ≤1kb from TSS (far left column). DE and KDM5A targets >1kb from TSS (center left column). DE and KDM5A targets >1kb from TSS that have lower expression in kdm5a-/-;Rb1-/- myoutubes compared to WT (center right column). DE and KDM5A targets >1kb from TSS that have higher expression in kdm5a-/;Rb1-/- myoutubes compared to WT (far right column). 
 284 
 Part IV Discussion 285 Chapter 10 DISCUSSION 287 10.1 Background Every cell is programmed to acquire and maintain its fate during cell division. This cellular programming is determined by epigenetics. Experimental data in mice and humans suggest that epigenetic changes occur very early during neoplastic transformation, thus supporting a causal role for epigenetic changes in cancer [312]. This point of view is reflected in the “epigenetic progenitor model” proposing that cancer originates in stem or progenitor cells. However, despite the overwhelming evidence that the “epigenetic landscape” of cancer is different from that of normal tissues, the underlying mechanisms are unknown, and substantial work remains to be done before epigenetic cancer diagnosis will start to have a measurable positive effect on disease burden in the population. Compared with genetically inherited mutations, epigenetic aberrations are pharmacologically reversible. Moreover, epigenetics dictates gene expression signatures, or modules, that predict pathway activation in cancer as phenotypes for genetic discovery and drug sensitivity [313]. The inactivation of the retinoblastoma (RB1) gene is viewed as a necessary step in the development of human cancers [314]. Disregulation of the RB1 protein product, pRB, as a result of mutations in genes upstream in the RB pathway is common to most tumors [315]. Multiple proteins have been described as pRB-interacting proteins and a few of these protein-protein interactions have been implicated in the regulation of several processes in cancer [316]. The major role of pRB in tumor suppression has been linked to its role in the negative regulation of cell cycle progression. This pRB function has been explained through its inactivation of the E2F family of transcription factors that control the expression of genes during the G1- to S-phase transition [314]. The E2F gene expression signature is now viewed as critical in many types of cancer [317]. The role of E2F may be even more prevalent. Despite the lack of common genes across gene expression signatures characterizing cancer survival, there is in fact, a common biology, represented in the form of cell cycle control. The mechanisms of E2F control by pRB have been under intense investigation in the past decade and include active repression by direct binding to the E2F activation domain [318], inhibition of the assembly of pre-initiation complexes [319], and the recruitment of various chromatin-modifying enzymes [320]. Although most ChIP studies were unable to demonstrate direct binding of pRB to the promoters of E2F-dependent genes, they readily revealed that another pRB related protein, either p107 or p130, is recruited to these promoters at a specific stage of cell cycle and this distinguishes target gene regulation [321, 322]. In early G1 cells and in quiescent cells, promoters of cell-cycle-regulated genes are occupied by E2F4 and p130 which in some instances also recruit the Sin3 co-repressor/histone deacetylase HDAC complex required for the repression of these genes [320] (Figure 10.1). Terminal silencing of cell cycle genes, which is observed 289 10. D ISCUSSION Figure 10.1: Target genes regulation in the RB/E2F pathway. Co-factors recruited to E2F target genes are shown at different stages of the cell cycle and in differentiating cells. During cell cycle progression, there is periodical and reversible recruitment of RB/E2F complexes. The H3K4 HMT, MLL, is recruited in S phase. The DREAM complex associates with E2F4-p130 target promoters in quiescent cells. KDM5A and the DREAM complex are recruited in differentiating cells. during differentiation, was thought to be achieved through pRB association with the PcG repressor complexes, PRC1 and PRC2, and marking promoters with H3K27me3 [320]. Recent work has shown that KDM5A is part of a Sin3 complex in mouse skeletal muscle cells [78]. Also it has been that KDM5A interact with PCR2 in ES cells and this interaction is required for repression of previously defined PRC2 target genes [93]. As cells progress into S phase, E2F4 and Sin3 dissociate from genes, leading to increased histone acetylation and gene expression. The MLL and Set-1 histone H3K4 methyltransferases (H3K4 HMTs) are also recruited to E2F responsive promoters and induce histone methylation [323, 324]. Therefore, several protein complexes have been linked to the pRB-mediated repression of cell cycle progression which includes enzymes modifying histone methyl marks. However, there is another pRB function related to the regulation of differentiation that is now considered to prevail in tumor development. Analysis of tissue-specific Rb gene knockouts showed that most tissues exhibited significantly diminished or even absent markers of terminal differentiation, defects in apoptosis and unscheduled proliferation [325]. Cells deficient in 290 10.2 Epigenetically regulated gene module In this study we sought to elucidate epigenetically regulated gene modules, in particular, we studied the recruitment of enzymes involved in histone methylation and the expression of genes located in the regions experiencing histone methylation. While I focused my study on KDM5A in relation to the RB/E2F pathway, I was interested to know where other histone-modifying 291 10. D ISCUSSION RB function appeared to cycle normally without pRB but were unable to establish a permanent cell cycle arrest [326]. The direct role of pRB in differentiation was further supported by studies in a variety of cell culture-based models [327, 328, 329]. For example, depletion of pRB inhibits monocytic differentiation in U937 cells [330]. These works have established the role of pRB in the initiation and maintenance of permanent cell cycle arrest and in concomitant induction of genes required for terminal differentiation. Probably the most compelling evidence that mechanisms operating in differentiation are distinct from the mechanisms operating in cell cycle control, is the identification of pRB derivatives, both synthetic and natural, that fail to interact with E2F1 but are still able to promote differentiation [97]. These mutants are able to activate transcription mediated by tissue-specific factors, but fail to repress E2F-dependent transcription. Consistent with these data, knock-in mice that carry a similar mutation demonstrate improved cellular differentiation in some tissues [331]. However, inactivation of pRB activates histone demethylase KDM5A which exerts inhibitory activity. Previous studies have discovered histone demethylase activities specific to trimethylated H3K4 (H3K4me3) in the JARID1/KDM5 proteins [91]. Several line of evidences from our group’s previous work suggest that KDM5A is a decision-maker in the transition from proliferation to differentiation in mammalian cells. It has been also shown that KDM5A target genes are preferentially active in cells with an immature, proliferative state: stem cells and leukemia cells. Recent microarray studies have demonstrated that “proliferative” gene expression modules are associated with a poor disease outcome. While the wealth of data surrounding the importance of the proliferative module is satisfying, the principle question of how it is established and maintained to remain open. Strikingly, virtually all of the genes in the proliferative module are regulated by pRB/E2F. Thus, achieving an understanding of normal and cancer epigenetic mechanisms regulated by pRB/E2F will lead to targeted intervention and targeted therapies. In contrast to genetic lesions, epigenetic alterations are potentially reversible, thus holding a promise of therapeutic intervention. Epigenetic modifications are currently at the frontier of pharmaceutical development and drug discovery. proteins, specially HMTs and HDMs are bound in the genome in relation to KDM5A, and where they are preferentially expressed and repressed. This knowledge is important for understanding the functions of the epigenetic landscape. 10.2.1 Mode of HMT/HDM recruitment elucidates a broader picture Firstly, we characterized the modes of recruitment of histone-modifying enzymes using our data as well as publicly available data [332]. Through the analysis of all available ChIP-on-chip and ChIP-seq data, we described the distribution of binding sites of different histone-modifying proteins. Studying the recruitment aspects of histone modifying enzymes is important because it allows us to identify epigenetically regulated genes. To our knowledge this was the first report in which several HMTs and HDMs were analyzed and compared in the same study using the same algorithm and computational tools. This coordinated comparison allowed me to reach the conclusion that recruitment of HMTs and HDMs can be coordinated, opposed or alternating. The results of such studies will aid in testing the effects of targeted epigenetic therapies, which would be more or less disruptive depending on whether cooperating or opposing epigenetic activities are targeted. 10.2.2 A distinct co-expression and coregulation pattern of HMTs/HDMs exists in multiple cancer types One form of regulation of gene expression is largely accomplished by enzymes which modify histone proteins. For each histone protein, multiple histone-modifying enzymes have been discovered in mammals. The activities of these enzymes are important since mutations in individual genes encoding these enzymes result in developmental abnormalities or even lethality. I used bioinformatics and integrative analysis in large-scale studies to explore the relationship between the expression of genes displaying histone modifications, and the expression of genes encoding corresponding enzymes [333]. Surprisingly, I found that expression of genes with histone modifications changes in a coordinated manner in the context of all genes present in the human genome. Moreover, I found that their expression was highly predictable from the level of expression of the corresponding histone-modifying enzyme and of several other co-regulated enzymes. I concluded that due to the co-regulation of histone-modifying enzymes, studies of hundreds of genes subject to their activity can be reduced to the studies of simple modules. This study enabled identification of sets of co-regulated HDMs and HMTs, comprising a HDM/HMT gene expression signature. The correlations that we 292 observed in normal tissues were different from correlations identified in cancer cells. Considering multiple alterations in HDM and HMT genes in cancer cells, it seems that this approach could yield new functional connections between histone modifying enzymes, which might be helpful in designing a combination cancer therapy. With this identified signature we can ask how cancer patient meta data and patient survival, treatment response etc. correlates with expression. This could help to identify the best “drugable” common and cancer type dependent modules. 10.2.3 Utilization of the ChIP-seq technique in combination with other bioinformatic approaches yields insightful knowledge from genome-wide data The complexity of genome location analysis (e.g. ChIP-on-chip, ChIP-seq, microarrays) is not easy to address in paper publications. While we described these applications in our articles ([332, 333, 334], and manuscripts under review), I think it was most successfully described through a published video of the analysis performed by several of our laboratory members. It was critically important to make this highly-optimized technique, developed in collaboration with two world-class groups, available to any interested researcher. We recorded a movie in the Journal of Visualized Experiment (JoVE), that shows the procedure step-by-step, in order to make it more accessible. With over 15,500 (at present) viewers, we disseminated this knowledge to both the research community and the general public. 10.2.4 Relative requirement of KDM5A and other enzymes for H3K4 methylation Chromatin provides a key for the access to the genome, being involved principally in all DNA-mediated processes. Methylation at histone H3K4 is required for transcriptional activation and many other associated events. High-resolution studies of histone methylation in the mammalian genomes showed that H3K4 methylation is highly elevated at the TSS. However, it is not known which enzymes, synthesizing and degrading histone methylation, are bound to H3K4 methylated genomic regions. As a part of my study, I performed a systematic analysis of genomic regions enriched in H3K4me3 and KDM5A. My data suggest that KDM5A is at the top of hierarchy of regulation of histone methylation. First, KDM5A highly overlaps with the majority of TSS regions carrying H3K4me3. These genes belong to specific gene ontology categories such as RNA processing and protein transport. However, KDM5A is excluded from many regions with the low levels of H3K4. These regions are more distantly located from known genes and associated with developmental 293 10. D ISCUSSION functions. Notably, we are first to report isoform-specific genomic localization of KDM5A and the specific biological functions of these isoforms. In another project (Beshiri et al., submitted), our lab found that decreasing the level of KDM5A, either by shRNA in a human cancer cell line or using homoand heterozygote mice, increased the global level of H3K4me3. Cells were unable to maintain normal histone methylation when lacking KDM5A, despite the enhanced recruitment of the highly homologous protein PLU1/KDM5B to KDM5A targets [332]. This is in sharp contrast to the effect of deletion of any one of the MLL family members that had only a minimal effect on the global levels of H3K4 methylation. MLL1 overexpression increases levels of histone H3K4me2/3 at specific genes, such as HOXA9, yet does not change the global level of H3K4me2/3 [335]. The majority of histone H3 in proliferating cells is part of chromatin, suggesting that the global changes can be explained by an increase in H3K4 methylation at the majority of KDM5A targets. Indeed, this result was observed in ChIP assays. Secondly, KDM5A directly regulates transcription of histone H3.3 and histone H3 modifying enzymes by binding to their promoter regions. Analysis of Kdm5a knockout cells showed that methylation also changes in the regions that are not bound by KDM5A but likely to be regulated by a demethylase that is normally repressed by KDM5A. These results suggest that not only different location of KDM5A determines the different regulation outcome, but its direct regulation of other histone modifying enzymes serving genes involved in distinct biological processes. 10.3 Functions of the pRB/E2F tumor suppressor pathway and its link to epigenetic regulation Epigenetic regulation underlies the robust changes in gene expression that occur during development. We show that KDM5A has a unique and non-redundant role in histone demethylation. Knockout of KDM5A resulted in derepression of its direct targets during ES cell differentiation, denoting a direct regulatory function. Previously, our lab showed that KDM5A intimately linked to the effects of RB/E2F tumor suppressor pathway on cell growth and differentiation [62]. While E2F is well known for its profound function in cell cycle activity through E2F-dependent gene expression which ultimately pushes cells from G1 to S-phase, surprisingly, it has been shown that cells lacking functional E2F are still capable of S-phase entry [336, 337]. This paradox raised the questions of when, where and at what point E2F is most important. It also begs the study of parallel tumor suppressor pathways, like Hippo. As I had aimed learn more about the growth-suppression function of RB/E2F, I worked with Dr. Maxim Frolov (University of Illinois at Chicago, USA) on the global analysis of cells deficient in the RB/E2F pathway. In this work, I used Drosophila, which is an 294 advantageous model system to address this question since the RB/E2F network is highly conserved yet simpler in flies. In the field of cancer biology, loss of pRB is viewed as one of the earliest events in tumor initiation. It is therefore, important to identifying alterations in other tumor suppressor pathways that cooperate with pRB inactivation in promoting cell proliferation. The Drosophila genome encodes a single activator, dE2F1, a single repressor, dE2F2, and two pRB-related proteins, RBF1 and RBF2 [178]. When pRB function is compromised, unrestrained E2F activity is sufficient to inappropriately drive quiescent cells into S phase. Inactivation of de2f1 results in a strong block of cell proliferation [338]. However, this is mostly due to the unchecked activity of the dE2F2/RBF repressor in these cells as the de2f1 mutant phenotype is suppressed by inactivation of de2f2 or rbf [339, 340]. Dr. Frolov’s lab uses genetic approaches to identify genes that are required for or regulate the growth-suppressive function of dE2F2/RBF. Inactivation of growth repressors results in strong overproliferation. One can predict that these mutants would rescue cell proliferation in de2f1 mutants. To test this, Dr. Frolov’s lab investigated the requirement for E2F in oncogenic proliferation, specifically following inactivation of the Hippo tumor-suppressor pathway [341]. 10.3.1 The RB/E2F pathway cooperates with another critical tumor suppressor pathway in growth control Here we showed that the RB and Hippo pathways synergize to control cell cycle exit upon differentiation. I used microarray profiling and through a computational approach identified a distinct group of cell cycle related genes, which were upregulated only when both the RB and Hippo pathways were inactivated. These genes were subsequently tested in transcriptional reporter assays and by ChIP. The result of these assays showed that dE2F1 and Yki, a transcriptional co-activator and effector of the Hippo signaling pathway [342], synergistically and directly activated these target genes. The synergy between Yki and dE2F1 may explain why Yki is so dependent on dE2F1 during inappropriate proliferation [343]. We developed a model where Yki requires the assistance of dE2F1 to upregulate the dE2F1-Yki common target genes to a level sufficient to bypass cell cycle exit. Conversely, unrestrained Yki also directly contributes to deregulated E2F-dependent transcription. This is an important point since E2F-dependent transcription is deregulated in tumor cells [344]. We propose that, depending on the identity of other cooperating mutations in Rb-deficient tumor cells, E2F can potentially synergize with a distinct set of transcription factors to engage in transcriptional programs unique to tumor cells of different origins. These findings were reported in ‘Genes and Development’ [345]. Finally, we uncovered a novel function of the RB pathway, which, together with the Hippo pathway, maintains the differentiated state of a cell. 295 10. D ISCUSSION Notably, this is independent of the role of the Hippo and RB pathways in cell proliferation [346]. Perhaps, the most significant impact of the above-mentioned findings is that our work in flies had a positive and stimulating impact on the mammalian RB field. We have actively collaborated with the laboratories of Nick Dyson (Mass General Hospital, Boston) and Jim DeCaprio (Dana-Farber Cancer Institute, Boston) during their studies of the Hippo pathway in pRB induced senescence in mammalian cells. The cooperation between our groups led to back-to-back publications [346, 347, 348] demonstrating that interaction between pRB and Hippo pathways is conserved. For review of these publications please see the ‘perspective’ written by Dick and Mymryk [349]. Reviewing these works and based on the fact that in mammalian cells pRB is thought to restrict proliferation and the Hippo pathway is linked with organ size homeostasis, we could speculate that dual inactivation of these pathways might lead to highly aggressive tumor formation. Moreover, since in humans, KDM5A is directly linked with pRB and recruited to E2F: would it be far from possibility if we imagine KDM5A homolog Lid’s involvement in this crosstalk process between RBF1 and Hippo/Yki? 10.3.2 RB/E2F pathway function is controlled by miRNA While we were studying how another tumor suppressor pathway effects specific E2F target genes, we explored whether a subgroup of E2F targets can be regulated by changes in the level of E2F itself. We noticed that the de2f1 genomic locus contains a gene encoding the microRNA, mir-11. More recently, a second microRNA gene in the de2f1 genomic locus was identified: mir-998 [350]. Both microRNAs are embedded in the last intron of the de2f1 gene. Such intronic microRNAs are usually co-transcribed with their host gene, miR-11 and miR-998 are likely to be present in cells that express dE2F1. However, little was known about the relationship between mir-11, mir-998 and their host gene, de2f1. We began by investigating the role of mir-11 in the context of dE2F regulation. First, Dr. Frolov’s lab confirmed that the expression of the mature miR-11 parallels that of de2f1. Next, they showed that, in transgenic animals, co-expression of miR-11 strongly suppressed dE2F1-induced apoptosis, while having no effect on dE2F1-driven cell proliferation. Early studies demonstrated that, in addition to driving cell into the S phase, overexpression of dE2F1 potently induces apoptosis in mammalian cells [351, 352]. To explain this result, I analyzed gene expression microarrays with a bioinformatic functional clustering approach and showed that co-expression of miR-11 specifically blocked the dE2F1-dependent pro-apoptotic transcriptional signature, but had no effect on the dE2F1-dependent cell proliferation signature. Using these data, by 3 -UTR sensor assays and ChIP we further confirmed that both are direct targets 296 of miR-11 and dE2F1. I followed up on this result and used a bioinformatics approach to identify a set of genes with roles associated with cell death, which are regulated by both dE2F1 and miR-11. We published these results in ‘Genes and Development’ recently [353]. From these findings we developed a model in which the embedded microRNA, mir-11, establishes a partial negative feedback loop to specifically inhibit the expression of dE2F1-induced cell death targets upon irradiation. Thus, miR-11 permits the transcriptional activation of the cell proliferation transcriptional signature by dE2F1, while limiting the dE2F1 pro-apoptotic transcriptional program in response to irradiation. This is a significant finding since it illustrates a novel relationship between an embedded microRNA and its host gene. To our knowledge, this is the first example of a microRNA directly regulating direct targets of its transcription factor host. Based on this phenomena we further aim to determine the relationship between mir-998, located within the same intron of the de2f1 gene, and the dE2F family of transcription factors. These finding also drive me to speculate that embedded miRNAs might have additional functions on shared targets of the host and such microRNAs and/or host might have different structural properties. Additionally, host/miRNA might be under different epigenetic regulation, driving host and miRNA expression based on physiological or environmental stimuli. Wherever data is available, we are testing this hypothesis computationally, the outcome of which we aim to validate experimentally. 10.4 Epigenetic factors in the pRB/E2F-dependent genes regulation of The histone demethylase KDM5A has shown to interact with pRB, and both pRB and E2F are well know for their roles in cell cycle control. Because of its connection to RB/E2F, at this stage I studied KDM5A functions in the regulation of cell cycle genes and developmental genes. 10. D ISCUSSION 10.4.1 KDM5A regulates RB/E2F targets during differentiation During differentiation KDM5A is differentially bound to cell cycle genes, genes involved in mitochondrial function, and to developmental genes such as the HOX genes [62, 83, 265]. Our previous microarray study depicted that correlation between recruitment of KDM5A and decreased activity was characteristic for the whole group of genes specifically bound by KDM5A in the differentiated condition [62]. First, I asked if KDM5A is required for gene regulation specifically during differentiation. I was able to show that in differentiating Kdm5a−/− ES cells, 297 but not in undifferentiated cells, KDM5A targets at TSS have higher expression compared to wild-type ES cells. One of the possibilities is that these changes occur because of the increase in histone methylation. This was tested in ChIP experiments and it was found that KDM5A regulates methylation levels at the majority of its target genes, and its knockdown results in an increase in H3K4 trimethylation and to a lesser extent H3K4 dimethylation, without significantly changing nucleosomal occupancy. We concluded that KDM5A’s function in histone H3K4 demethylation, unlike other H3K4 demethylases, is not redundant as lack of KDM5A results in an increase in the global level of histone methylation. Due to the functional role for KDM5A regulation of H3K4 methylation, we next asked whether it is essential for the transcriptional silencing of cell cycle genes during differentiation. We used a monocyte/macrophage differentiation system to recapitulate the dynamics of transcription factor binding and histone modifications at such promoters; coincident to the silencing of cell cycle genes, E2F4/p130 and KDM5A were recruited, and the promoter lost H3K4 methylation. To address the question whether KDM5A is recruited to the genes occupied by RB/E2F complexes, I performed a global location analysis of KDM5A and E2F4. The comparison of genes occupied by both factors showed that KDM5A co-localizes with E2F4 at many target genes. This result prompted us to perform functional assays. The result of these studies showed that KDM5A works in parallel and cooperates with E2F-mediated repression. KDM5A was recruited to the transcription start site (TSS) region independently of E2F4; however, it cooperated with E2F4 and associated histone deacetylation, to promote a state of deepened repression at cell cycle genes during differentiation. Significantly, our study showed that KDM5A and the DREAM complex not only bind independently to the promoters of cell cycle-regulated genes in the repressed state but also independently and actively repress these genes. Since we aim to share our findings, this work is currently under review in the ‘Molecular and Cellular Biology’ journal. 10.4.2 During differentiation KDM5A binds TSS of cell cycle genes and enhancer of developmental genes Previous work from our laboratory showed that depletion of KDM5A significantly rescues the differentiation defects in cells with compromised RB function. This finding presented KDM5A as an attractive anti-cancer drug that can be used in RB deficient tumors. At least 90% of human cancers have the RB pathway inactivated, thus successful targeting RB negative cells would represent a significance advance in treatment of practically any cancer. We set out to determine which changes occur in cells with impaired RB and KDM5A function compared to RB deficient cells and normal cells. I explored KDM5A 298 ChIP-seq analysis, as well as analyses of different histone modifications, including H3K4me3 ChIP-seq in normal and Kdm5a null cells, and RNA-seq analysis. As a result of these studies, I have found that KDM5A modulates two separate categories of target genes during differentiation, each with distinct means of regulation by KDM5A. The first category is cell cycle genes and the other is developmental genes. The cell cycle genes are regulated by KDM5A from the promoter regions while the developmental genes are bound at distal regulatory regions many of which possess characteristics of enhancers. Our team tested these finding by analysis of individual myotubes derived from cells of different genotypes. Loss of KDM5A in a Rb1 null background enhanced cell cycle re-entry of terminally differentiated myotubes. At these genes, KDM5A was bound at the TSS only during differentiation, resulting in histone H3K4 demethylation and transcriptional repression. Moreover, when Rb1 null cells were prompted to differentiate, the loss of KDM5A rescued expression of markers of the relevant cell lineage. In particular, the KDM5A was directly bound to enhancers regulating myogenic genes, which were induced during differentiation after the release of KDM5A by myogenic transcriptions factors. However, the loss of KDM5A also induced markers of unrelated lineages, such as neuronal genes in myotubes. These data imply that KDM5A binding to enhancers of developmental genes is a general requirement for their silencing in the absence of lineage-specific transcription factors. In this work (manuscript in preparation), we showed for the first time that a single epigenetic factor can regulate two functionally distinct groups of genes, implicating different mechanisms. When we looked carefully at the developmental genes, KDM5A targets which are differentially expressed in Kdm5a null cells, we found that the top genes that changed in expression were developmental modules of various lineages, including myogenesis and especially neuronal development. Based on this observation, we draw a speculative, broader hypothetical model that KDM5A is needed for repression of most, if not all, developmental genes in undifferentiated cells by binding their enhancers. When a particular differentiation program is activated, the master transcription factor (TF) binds to the enhancer and removes KDM5A and the associated histone H3K4 methylation mark is changed. The differentiation genes of other lineages remain unchanged because of the unavailability of the corresponding TF. However, in the absence of KDM5A, all differentiation programs will be induced even in the absence of corresponding TFs. So, in the comparison of Kdm5a−/− to wild-type (wt) cells, there will be programs unrelated to myogenesis induction. Neurogenesis is probably a “default” program, explaining why it is more represented in the functional clustering of differentially expressed genes, but myogenesis itself is not much different . 299 10. D ISCUSSION 10.5 Implication of our studies for general knowledge and human health (1) Cancer evolution is a hallmark of the progression to metastatic disease. Cancer cells have remarkable plasticity. They quickly respond to changes in their environment by switching phenotypes, for example from more differentiated to less differentiated states. Switching phenotypes may be due to epigenetic changes, which would mimic lineage commitment in stem cells during normal development. Most recent studies suggest that epigenetic mechanisms are involved. Studying KDM5A and its cross-talk to equivalent pathways will help to uncover new mechanisms driving tumorigenesis. (2) Our finding that PLU1/KDM5B is recruited to KDM5A targets in the absence of KDM5A, indicates that rationally designed drugs that can inhibit both KDM5A and KDM5B would be more effective than KDM5A suppressors alone. (3) I have identified the expression signatures of HDMs/HMTs in a large number of cancer types and showed that they vary from those in normal cells. Also, multiple correlations occur between HDMs and HMTs. This phenomena can be used for diagnostic purpose and for designing combinational therapies. (4) Differentiation is controlled by the RB/E2F pathway, which is also a tumor suppressor. We have shown that the “switching” comes in to play in cancer cells because epigenetic factors (such as KDM5A) interact with tumor suppressors, (such as pRB). We studied a cell culture-based model where pRB is deleted. We have shown that by deletion of KDM5A, we can prevent expansion of these cells through initiation of the differentiation program and apoptosis, when prompted to differentiate. As pRB is deregulated in cancer, my studies suggest that use of KDM5A inhibitors would selectively kill cancer cells, while leaving normal Rb-positive cells intact. (5) Certain key cancer genes are likely to remain relatively difficult to drug [354]. This is a notable problem for the tumor suppressor genes, in which the gene product is often completely absent. We previously showed that pRB directly interacts with KDM5A. Here I have reported that they are functionally linked to regulation of cell cycle genes, whose deregulation is a hallmark of cancer. I also have found that in cells with completely lack pRB, deletion of KDM5A changes cancer cell properties. The implication of these results is that in the absence of pRB, KDM5A inhibitors can be 300 designed which will target the pRB-dependent process in cancer. In my present study I have identified direct genomic targets of KDM5A. We have also shown isform specific targets and their specific functions which would help identification of more specific targets. (6) “Re-treatment response” after a “drug holiday” is poor in many patients in many cancer types. Recently a direct link between overexpression of the chromatin-modifier KDM5A in cancer subpopulations and drug tolerance has been established [173]. However, the mechanism of their resistance is yet to be explained. In our study we have identified KDM5A-dependent regulatory modules, expression signatures and correlation patterns among various HDMs and HMTs which would help in understanding this process. Even if they could not explain the underlying mechanism of drug tolerance induced by KDM5A overexpression, authors of that study admitted that several other chromatin-modifying enzymes might contribute to drug tolerance in various tumor context. Findings in my HDM/HMT coregulation analysis provide supporting evidences for this speculation. Also, by knockout of Kdm5a I have shown their biological effects which indicate that restoring KDM5A to its normal suppressed level, we may better manage resistant populations. Moreover, since KDM5A has been shown to associate with HDAC, it is probable that combination therapy would be more effective against most drug refractory populations. (7) There are implications for therapeutic development in non-oncologic diseases such as sleep disorders. Recently a demethylase dispensable role of KDM5A has been reported. It was found that depletion or over-expression of KDM5A can alter the circadian rhythm [80]. KDM5A might form a complex in the target gene and facilitate demethylation, or it might recruit another protein which performs that function. In our research, we have elucidate genome-wide targets that might also help in understanding KDM5A role in other diseases and aid in better drug design. 301 10. D ISCUSSION Part V Conclusion 303 In this study, my main interest was to better understand epigenetic factors, especially histone-modifying enzymes in relation to cell cycle, differentiation and cancer. Particularly, I have focused on the RB/E2F tumor suppressor pathway and its interactor- the histone demethylase RBP2/KDM5A. However, over time, and as my interest grew, my work extended to the Hippo tumor suppressor pathway as well. I used human, mouse and fly models to determine genetic and epigenetic regulation of cell cycle, differentiation and its relation to cancer. In this work I applied bioinformatic techniques extensively and also utilized experimental molecular biology procedures in collaboration to provide further evidences as proof of concept. The key findings of this work can be summarized as: 1. I have delineated a comparative distribution of several HMT and HDM targets in the genome and found that recruitment of HMTs and HDMs can be coordinated, opposed or alternating. I have also shown that the KDM5A paralog PLU1 expression increases and recruits to KDM5A targets in the absence of KDM5A. 2. I have identified epigenetically-regulated gene modules that are prevailing in cancer and are distinct from their normal counterparts. The distinct co-expression and coregulation pattern exists in multiple cancer types can be predictable from the level of expression of the corresponding enzyme. 3. I have depicted genomic regions where KDM5A and H3K4me3 are effective, and established a relationship between them, as well as their functional effects. I have also elucidated isoform-specific locations and functions of KDM5A. 4. Through this study a novel cross-talk relationship between RB and the Hippo tumor suppressor pathway that synergistically control cell cycle exit upon differentiation have been established. 5. A novel gene regulatory relationship where host E2F is regulated by its embedded miRNA has been elucidated. V. C ONCLUSION 6. I have confirmed that KDM5A’s demethylation role is non-redundant. KDM5A plays an important role in differentiation by silencing cell cycle genes in collaboration with E2F and HDAC. 7. I have illustrated two ways of target gene regulation by KDM5A. Cell cycle-related genes are regulated by binding at the TSS and developmental gene modules through enhancer region-binding. 305 8. Through my work I explained the demethylation mechanism of KDM5A and I present it as a potent drug target candidate in cancer and probably in other diseases where KDM5A is deregulated. 306 Part VI Appendix 307 Appendix A: Other Contributed Projects 309 VI. A PPENDIX Chapter 11 INTOGEN: INTEGRATION AND DATA MINING OF MULTIDIMENSIONAL ONCOGENOMIC DATA 311 Gundem G, Perez-Llamas C, Jene-Sanz A, Kedzierska A, Islam A, DeuPons J, et al. IntOGen: integration and data mining of multidimensional oncogenomic data. Nat Methods. 2010 Feb;7(2):92-93. Chapter 12 WHOLE GENOME ANALYSIS OF P38 SAPK-MEDIATED GENE EXPRESSION UPON STRESS 315 Ferreiro I, Joaquin M, Islam A, Gomez-Lopez G, Barragan M, Lombardia L, et al. Whole genome analysis of p38 SAPK-mediated gene expression upon stress. BMC Genomics. 2010 Mar 1;11:144. Chapter 13 BIOLOGICAL REPROGRAMMING IN ACQUIRED RESISTANCE TO ENDOCRINE THERAPY OF BREAST CANCER 333 Aguilar H, Sole X, Bonifaci N, Serra-Musach J, Islam A, Lopez-Bigas N, et al. Biological reprogramming in acquired resistance to endocrine therapy of breast cancer. Oncogene. 2010 Nov 11;29(45):6071-6083. Aguilar H, Sole X, Bonifaci N, Serra-Musach J, Islam A, LopezBigas N, et al. Biological reprogramming in acquired resistance to endocrine therapy of breast cancer. Supplementary Materials and methods. Oncogene. 2010 Nov 11;29(45):6071-6083. Part VII Bibliography 351 Bibliography [1] J. Fuellgrabe, E. Kavanagh, and B. Joseph. Histone onco-modifications. Oncogene, 30(31):3391–3403, Aug 2011. URL: http://dx.doi.org/10.1038/onc.2011.121, doi:10.1038/onc.2011.121. 5, 33 [2] Minjia Tan, Hao Luo, Sangkyu Lee, Fulai Jin, Jeong Soo Yang, Emilie Montellier, Thierry Buchou, Zhongyi Cheng, Sophie Rousseaux, Nisha Rajagopal, Zhike Lu, Zhen Ye, Qin Zhu, Joanna Wysocka, Yang Ye, Saadi Khochbin, Bing Ren, and Yingming Zhao. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell, 146(6):1016–1028, Sep 2011. URL: http://dx. doi.org/10.1016/j.cell.2011.08.008, doi:10.1016/j. cell.2011.08.008. 5 [3] Ronen Marmorstein and Raymond C. Trievel. Histone modifying enzymes: structures, mechanisms, and specificities. Biochim Biophys Acta, 1789(1):58–68, Jan 2009. URL: http://dx. doi.org/10.1016/j.bbagrm.2008.07.009, doi:10.1016/ j.bbagrm.2008.07.009. 5, 16 [4] Dan Levy and Or Gozani. Decoding chromatin goes high tech. Cell, 142(6):844–846, Sep 2010. URL: http://dx.doi.org/10. 1016/j.cell.2010.08.032, doi:10.1016/j.cell.2010. 08.032. 6 [5] Sean D. Taverna, Haitao Li, Alexander J. Ruthenburg, C David Allis, and Dinshaw J. Patel. How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers. Nat Struct Mol Biol, 14(11):1025–1040, Nov 2007. URL: http://dx.doi.org/ 10.1038/nsmb1338, doi:10.1038/nsmb1338. 7, 19 [6] Michiel Vermeulen, H Christian Eberl, Filomena Matarese, Hendrik Marks, Sergei Denissov, Falk Butter, Kenneth K. Lee, Jesper V. Olsen, Anthony A. Hyman, Henk G. Stunnenberg, and Matthias 353 Mann. Quantitative interaction proteomics and genome-wide profiling of epigenetic histone marks and their readers. Cell, 142(6):967–980, Sep 2010. URL: http://dx.doi.org/10.1016/j.cell.2010. 08.020, doi:10.1016/j.cell.2010.08.020. 7, 8 [7] Francois Roudier, Ikhlak Ahmed, Caroline Berard, Alexis Sarazin, Tristan Mary-Huard, Sandra Cortijo, Daniel Bouyer, Erwann Caillieux, Evelyne Duvernois-Berthet, Liza Al-Shikhley, Laurene Giraut, Barbara Despres, Stephanie Drevensek, Fredy Barneche, Sandra Derozier, Veronique Brunaud, Sebastien Aubourg, Arp Schnittger, Chris Bowler, Marie-Laure Martin-Magniette, Stephane Robin, Michel Caboche, and Vincent Colot. Integrative epigenomic mapping defines four main chromatin states in arabidopsis. EMBO J, 30(10):1928–1938, May 2011. URL: http://dx.doi.org/10.1038/emboj.2011.103, doi: 10.1038/emboj.2011.103. 7 [8] Jason Ernst and Manolis Kellis. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol, 28(8):817–825, Aug 2010. URL: http://dx.doi.org/ 10.1038/nbt.1662, doi:10.1038/nbt.1662. 9 [9] B. D. Strahl and C. D. Allis. The language of covalent histone modifications. Nature, 403(6765):41–45, Jan 2000. URL: http://dx. doi.org/10.1038/47412, doi:10.1038/47412. 9 [10] Yujiang Shi, Fei Lan, Caitlin Matson, Peter Mulligan, Johnathan R. Whetstine, Philip A. Cole, Robert A. Casero, and Yang Shi. Histone demethylation mediated by the nuclear amine oxidase homolog lsd1. Cell, 119(7):941–953, Dec 2004. URL: http://dx.doi.org/10. 1016/j.cell.2004.12.012, doi:10.1016/j.cell.2004. 12.012. 9, 14 [11] Yang Shi and Johnathan R. Whetstine. Dynamic regulation of histone lysine methylation by demethylases. Mol Cell, 25(1):1–14, Jan 2007. URL: http://dx.doi.org/10.1016/j.molcel.2006. 12.010, doi:10.1016/j.molcel.2006.12.010. 9, 16, 17 [12] Jinrong Min, Qin Feng, Zhizhong Li, Yi Zhang, and Rui-Ming Xu. Structure of the catalytic domain of human dot1l, a non-set domain nucleosomal histone methyltransferase. Cell, 112(5):711–723, Mar 2003. 10 [13] Ken Sawada, Zhe Yang, John R. Horton, Robert E. Collins, Xing Zhang, and Xiaodong Cheng. Structure of the conserved core of the yeast dot1p, 354 a nucleosomal histone h3 lysine 79 methyltransferase. J Biol Chem, 279(41):43296–43306, Oct 2004. URL: http://dx.doi.org/10. 1074/jbc.M405902200, doi:10.1074/jbc.M405902200. 10 [14] S. Rea, F. Eisenhaber, D. O’Carroll, B. D. Strahl, Z. W. Sun, M. Schmid, S. Opravil, K. Mechtler, C. P. Ponting, C. D. Allis, and T. Jenuwein. Regulation of chromatin structure by site-specific histone h3 methyltransferases. Nature, 406(6796):593–599, Aug 2000. URL: http://dx.doi.org/10.1038/35020506, doi: 10.1038/35020506. 10 [15] David C. Schultz, Kasirajan Ayyanathan, Dmitri Negorev, Gerd G. Maul, and Frank J Rauscher, 3rd. Setdb1: a novel kap-1-associated histone h3, lysine 9-specific methyltransferase that contributes to hp1-mediated silencing of euchromatic genes by krab zinc-finger proteins. Genes Dev, 16(8):919–932, Apr 2002. URL: http://dx.doi.org/10.1101/ gad.973302, doi:10.1101/gad.973302. 10 [16] Irina Velichutina, Rita Shaknovich, Huimin Geng, Nathalie A. Johnson, Randy D. Gascoyne, Ari M. Melnick, and Olivier Elemento. Ezh2-mediated epigenetic silencing in germinal center b cells contributes to proliferation and lymphomagenesis. Blood, 116(24):5247–5255, Dec 2010. URL: http: //dx.doi.org/10.1182/blood-2010-04-280149, doi:10.1182/blood-2010-04-280149. 10 [17] Craig J. Ceol, Yariv Houvras, Judit Jane-Valbuena, Steve Bilodeau, David A. Orlando, Valentine Battisti, Lauriane Fritsch, William M. Lin, Travis J. Hollmann, Fabrizio Ferre, Caitlin Bourque, Christopher J. Burke, Laura Turner, Audrey Uong, Laura A. Johnson, Rameen Beroukhim, Craig H. Mermel, Massimo Loda, Slimane Ait-Si-Ali, Levi A. Garraway, Richard A. Young, and Leonard I. Zon. The histone methyltransferase setdb1 is recurrently amplified in melanoma and accelerates its onset. Nature, 471(7339):513–517, Mar 2011. URL: http://dx.doi.org/ 10.1038/nature09806, doi:10.1038/nature09806. 10 [18] Bing Xiao, Jonathan R. Wilson, and Steven J. Gamblin. Set domains and histone methylation. Curr Opin Struct Biol, 13(6):699–705, Dec 2003. 10 [19] Ronen Marmorstein. Structure of set domain proteins: a new twist on histone methylation. Trends Biochem Sci, 28(2):59–62, Feb 2003. 10 [20] Anup K. Upadhyay and Xiaodong Cheng. Dynamics of histone lysine methylation: structures of methyl writers and erasers. Prog Drug Res, 67:107–124, 2011. 10 355 VII. B IBLIOGRAPHY [21] Raymond C. Trievel, E Megan Flynn, Robert L. Houtz, and James H. Hurley. Mechanism of multiple lysine methylation by the set domain enzyme rubisco lsmt. Nat Struct Biol, 10(7):545–552, Jul 2003. URL: http://dx.doi.org/10.1038/nsb946, doi:10.1038/ nsb946. 10 [22] Xiaodong Cheng and Xing Zhang. Structural dynamics of protein lysine methylation and demethylation. Mutat Res, 618(1-2):102–115, May 2007. URL: http://dx.doi.org/10.1016/j.mrfmmm.2006. 05.041, doi:10.1016/j.mrfmmm.2006.05.041. 10 [23] Mareike Albert and Kristian Helin. Histone methyltransferases in cancer. Semin Cell Dev Biol, 21(2):209–220, Apr 2010. URL: http://dx. doi.org/10.1016/j.semcdb.2009.10.007, doi:10.1016/ j.semcdb.2009.10.007. 11, 12, 34, 35 [24] Stacey M. Southall, Poon-Sheng Wong, Zain Odho, S Mark Roe, and Jon R. Wilson. Structural basis for the requirement of additional factors for mll1 set domain activity and recognition of epigenetic marks. Mol Cell, 33(2):181–191, Jan 2009. URL: http://dx. doi.org/10.1016/j.molcel.2008.12.029, doi:10.1016/ j.molcel.2008.12.029. 11, 13 [25] Vanja Avdic, Pamela Zhang, Sylvain Lanouette, Adam Groulx, ˜ Veronique Tremblay, Joseph Brunzelle, and Jean-FranA§ois Couture. Structural and biochemical insights into mll1 core complex assembly. Structure, 19(1):101–108, Jan 2011. URL: http://dx.doi.org/ 10.1016/j.str.2010.09.022, doi:10.1016/j.str.2010. 09.022. 11 [26] Chao Xu, Chuanbing Bian, Robert Lam, Aiping Dong, and Jinrong Min. The structural basis for selective binding of non-methylated cpg islands by the cfp1 cxxc domain. Nat Commun, 2:227, 2011. URL: http://dx.doi.org/10.1038/ncomms1237, doi:10.1038/ ncomms1237. 11 [27] Thomas A. Milne, Yali Dou, Mary Ellen Martin, Hugh W. Brock, Robert G. Roeder, and Jay L. Hess. Mll associates specifically with a subset of transcriptionally active target genes. Proc Natl Acad Sci U S A, 102(41):14765–14770, Oct 2005. URL: http://dx.doi.org/10. 1073/pnas.0503630102, doi:10.1073/pnas.0503630102. 11 [28] Natalie K. Goto, Tsaffrir Zor, Maria Martinez-Yamout, H Jane Dyson, and Peter E. Wright. Cooperativity in transcription factor 356 binding to the coactivator creb-binding protein (cbp). the mixed lineage leukemia protein (mll) activation domain binds to an allosteric site on the kix domain. J Biol Chem, 277(45):43168–43174, Nov 2002. URL: http://dx.doi.org/10.1074/jbc.M207660200, doi: 10.1074/jbc.M207660200. 11 [29] P. Ernst, J. Wang, M. Huang, R. H. Goodman, and S. J. Korsmeyer. Mll and creb bind cooperatively to the nuclear coactivator creb-binding protein. Mol Cell Biol, 21(7):2249–2258, Apr 2001. URL: http: //dx.doi.org/10.1128/MCB.21.7.2249-2258.2001, doi: 10.1128/MCB.21.7.2249-2258.2001. 11 [30] Anamika Patel, Valarie E. Vought, Venkatasubramanian Dharmarajan, and Michael S. Cosgrove. A conserved arginine-containing motif crucial for the assembly and enzymatic activity of the mixed lineage leukemia protein-1 core complex. J Biol Chem, 283(47):32162–32175, Nov 2008. URL: http://dx.doi.org/10.1074/jbc.M806317200, doi: 10.1074/jbc.M806317200. 11 [31] Anamika Patel, Venkatasubramanian Dharmarajan, Valarie E. Vought, and Michael S. Cosgrove. On the mechanism of multiple lysine methylation by the human mixed lineage leukemia protein-1 (mll1) core complex. J Biol Chem, 284(36):24242–24256, Sep 2009. URL: http://dx.doi. org/10.1074/jbc.M109.014498, doi:10.1074/jbc.M109. 014498. 11 [32] Michael S. Cosgrove and Anamika Patel. Mixed lineage leukemia: a structure-function perspective of the mll1 protein. FEBS J, 277(8):1832–1842, Apr 2010. URL: http: //dx.doi.org/10.1111/j.1742-4658.2010.07609.x, doi:10.1111/j.1742-4658.2010.07609.x. 13 [33] Angela H. Ting, Kelly M. McGarvey, and Stephen B. Baylin. The cancer epigenome–components and functional correlates. Genes Dev, 20(23):3215–3231, Dec 2006. URL: http://dx.doi.org/10. 1101/gad.1464906, doi:10.1101/gad.1464906. 14 [34] Shinya Mimasu, Toru Sengoku, Seketsu Fukuzawa, Takashi Umehara, and Shigeyuki Yokoyama. Crystal structure of histone demethylase lsd1 and tranylcypromine at 2.25 a. Biochem Biophys Res Commun, 366(1):15–22, Feb 2008. URL: http://dx.doi.org/10. 1016/j.bbrc.2007.11.066, doi:10.1016/j.bbrc.2007. 11.066. 14, 16 357 VII. B IBLIOGRAPHY [35] Maxim Nekrasov, Tetyana Klymenko, Sven Fraterman, Bernadett Papp, Katarzyna Oktaba, Thomas Kocher, Adrian Cohen, Hendrik G. Stunnenberg, Matthias Wilm, and Jurg Muller. Pcl-prc2 is needed to generate high levels of h3-k27 trimethylation at polycomb target genes. EMBO J, 26(18):4078–4088, Sep 2007. URL: http://dx. doi.org/10.1038/sj.emboj.7601837, doi:10.1038/sj. emboj.7601837. 14 [36] Jamy C. Peng, Anton Valouev, Tomek Swigut, Junmei Zhang, Yingming Zhao, Arend Sidow, and Joanna Wysocka. Jarid2/jumonji coordinates control of prc2 enzymatic activity and target gene occupancy in pluripotent cells. Cell, 139(7):1290–1302, Dec 2009. URL: http://dx.doi.org/10.1016/j.cell.2009.12.002, doi: 10.1016/j.cell.2009.12.002. 14 [37] Cyrus Martin, Ru Cao, and Yi Zhang. Substrate preferences of the ezh2 histone methyltransferase complex. J Biol Chem, 281(13):8365–8370, Mar 2006. URL: http://dx.doi.org/10. 1074/jbc.M513425200, doi:10.1074/jbc.M513425200. 14 [38] Tetyana Klymenko, Bernadett Papp, Wolfgang Fischle, Thomas Kocher, Malgorzata Schelder, Cornelia Fritsch, Brigitte Wild, Matthias Wilm, and Jurg Muller. A polycomb group protein complex with sequence-specific dna-binding and selective methyl-lysine-binding activities. Genes Dev, 20(9):1110–1122, May 2006. URL: http://dx.doi.org/10. 1101/gad.377406, doi:10.1101/gad.377406. 14 [39] Johanna C. Scheuermann, Andres Gaytan de Ayala Alonso, Katarzyna Oktaba, Nga Ly-Hartig, Robert K. McGinty, Sven Fraterman, Matthias Wilm, Tom W. Muir, and Jurg Mudie, L.ller. Histone h2a deubiquitinase activity of the polycomb repressive complex pr-dub. Nature, 465(7295):243–247, May 2010. URL: http://dx.doi.org/10. 1038/nature08966, doi:10.1038/nature08966. 14 [40] Julien Vandamme, Pamela Volkel, Claire Rosnoblet, Perrine Le Faou, and Pierre-Olivier Angrand. Interaction proteomics analysis of polycomb proteins defines distinct prc1 complexes in mammalian cells. Mol Cell Proteomics, 10(4):M110.002642, Apr 2011. URL: http://dx.doi. org/10.1074/mcp.M110.002642, doi:10.1074/mcp.M110. 002642. 14 [41] Raphael Margueron and Danny Reinberg. The polycomb complex prc2 and its mark in life. Nature, 469(7330):343–349, Jan 2011. 358 URL: http://dx.doi.org/10.1038/nature09784, doi:10. 1038/nature09784. 15 [42] Ivan Garcia-Bassets, Young-Soo Kwon, Francesca Telese, Gratien G. Prefontaine, Kasey R. Hutt, Christine S. Cheng, Bong-Gun Ju, Kenneth A. Ohgi, Jianxun Wang, Laure Escoubet-Lozach, David W. Rose, Christopher K. Glass, Xiang-Dong Fu, and Michael G. Rosenfeld. Histone methylation-dependent mechanisms impose ligand dependency for gene activation by nuclear receptors. Cell, 128(3):505–518, Feb 2007. URL: http://dx.doi.org/10.1016/j.cell.2006.12.038, doi:10.1016/j.cell.2006.12.038. 14 [43] Pete Stavropoulos, Gunter Blobel, and Andre Hoelz. Crystal structure and mechanism of human lysine-specific demethylase-1. Nat Struct Mol Biol, 13(7):626–632, Jul 2006. URL: http://dx.doi.org/10.1038/ nsmb1113, doi:10.1038/nsmb1113. 16 [44] Yong Chen, Yuting Yang, Feng Wang, Ke Wan, Kenichi Yamane, Yi Zhang, and Ming Lei. Crystal structure of human histone lysine-specific demethylase 1 (lsd1). Proc Natl Acad Sci U S A, 103(38):13956–13961, Sep 2006. URL: http://dx.doi.org/10. 1073/pnas.0606381103, doi:10.1073/pnas.0606381103. 16 [45] Maojun Yang, Christian B. Gocke, Xuelian Luo, Dominika Borek, Diana R. Tomchick, Mischa Machius, Zbyszek Otwinowski, and Hongtao Yu. Structural basis for corest-dependent demethylation of nucleosomes by the human lsd1 histone demethylase. Mol Cell, 23(3):377–387, Aug 2006. URL: http://dx.doi.org/10.1016/j.molcel.2006. 07.012, doi:10.1016/j.molcel.2006.07.012. 16 [46] Eric Metzger, Melanie Wissmann, Na Yin, Judith M. Muller, Robert Schneider, Antoine H F M. Peters, Thomas Gunther, Reinhard Buettner, and Roland Schule. Lsd1 demethylates repressive histone marks to promote androgen-receptor-dependent transcription. Nature, 437(7057):436–439, Sep 2005. URL: http://dx.doi.org/10. 1038/nature04020, doi:10.1038/nature04020. 16 [47] Sarah C. Trewick, Paul J. McLaughlin, and Robin C. Allshire. Methylation: lost in hydroxylation? EMBO Rep, 6(4):315–320, Apr 2005. URL: http://dx.doi.org/10.1038/sj.embor. 7400379, doi:10.1038/sj.embor.7400379. 16 VII. B IBLIOGRAPHY [48] Yu-ichi Tsukada, Jia Fang, Hediye Erdjument-Bromage, Maria E. Warren, Christoph H. Borchers, Paul Tempst, and Yi Zhang. Histone 359 demethylation by a family of jmjc domain-containing proteins. Nature, 439(7078):811–816, Feb 2006. URL: http://dx.doi.org/10. 1038/nature04433, doi:10.1038/nature04433. 16 [49] Kenichi Yamane, Charalambos Toumazou, Yu-ichi Tsukada, Hediye Erdjument-Bromage, Paul Tempst, Jiemin Wong, and Yi Zhang. Jhdm2a, a jmjc-containing h3k9 demethylase, facilitates transcription activation by androgen receptor. Cell, 125(3):483–495, May 2006. URL: http://dx.doi.org/10.1016/j.cell.2006.03.027, doi: 10.1016/j.cell.2006.03.027. 16 [50] Johnathan R. Whetstine, Amanda Nottke, Fei Lan, Maite Huarte, Sarit Smolikov, Zhongzhou Chen, Eric Spooner, En Li, Gongyi Zhang, Monica Colaiacovo, and Yang Shi. Reversal of histone lysine trimethylation by the jmjd2 family of histone demethylases. Cell, 125(3):467–481, May 2006. URL: http://dx.doi.org/10.1016/j.cell.2006.03.028, doi:10.1016/j.cell.2006.03.028. 16 [51] Robert J. Klose, Eric M. Kallin, and Yi Zhang. Jmjc-domain-containing proteins and histone demethylation. Nat Rev Genet, 7(9):715–727, Sep 2006. URL: http://dx.doi.org/10.1038/nrg1945, doi: 10.1038/nrg1945. 16 [52] Paul A C. Cloos, Jesper Christensen, Karl Agger, and Kristian Helin. Erasing the methyl mark: histone demethylases at the center of cellular differentiation and disease. Genes Dev, 22(9):1115–1140, May 2008. URL: http://dx.doi.org/10.1101/gad.1652908, doi:10. 1101/gad.1652908. 18 [53] John R. Horton, Anup K. Upadhyay, Hank H. Qi, Xing Zhang, Yang Shi, and Xiaodong Cheng. Enzymatic and structural insights for substrate specificity of a family of jumonji histone lysine demethylases. Nat Struct Mol Biol, 17(1):38–43, Jan 2010. URL: http://dx.doi.org/10. 1038/nsmb.1753, doi:10.1038/nsmb.1753. 16 [54] Ying Yang, Lulu Hu, Ping Wang, Haifeng Hou, Yan Lin, Yi Liu, Ze Li, Rui Gong, Xiang Feng, Lu Zhou, Wen Zhang, Yuhui Dong, Huirong Yang, Hanqing Lin, Yiqin Wang, Charlie Degui Chen, and Yanhui Xu. Structural insights into a dual-specificity histone demethylase cekdm7a from caenorhabditis elegans. Cell Res, 20(8):886–898, Aug 2010. URL: http://dx.doi.org/10.1038/cr.2010.86, doi:10.1038/ cr.2010.86. 16 [55] Haifeng Hou and Hongtao Yu. Structural insights into histone lysine demethylation. Curr Opin Struct Biol, 20(6):739–748, Dec 360 2010. URL: http://dx.doi.org/10.1016/j.sbi.2010.09. 006, doi:10.1016/j.sbi.2010.09.006. 16 [56] Shengjiang Tu, Yu-Ching Teng, Chunhua Yuan, Ying-Ta Wu, Meng-Yu Chan, An-Ning Cheng, Po-Hsun Lin, Li-Jung Juan, and Ming-Daw Tsai. The arid domain of the h3k4 demethylase rbp2 binds to a dna ccgccc motif. Nat Struct Mol Biol, 15(4):419–421, Apr 2008. URL: http://dx.doi.org/10.1038/nsmb.1400, doi: 10.1038/nsmb.1400. 16, 29 [57] Barna D. Fodor, Stefan Kubicek, Masato Yonezawa, Roderick J. O’Sullivan, Roopsha Sengupta, Laura Perez-Burgos, Susanne Opravil, Karl Mechtler, Gunnar Schotta, and Thomas Jenuwein. Jmjd2b antagonizes h3k9 trimethylation at pericentric heterochromatin in mammalian cells. Genes Dev, 20(12):1557–1562, Jun 2006. URL: http://dx.doi.org/10.1101/gad.388206, doi:10.1101/ gad.388206. 16 [58] Sotirios C. Kampranis and Philip N. Tsichlis. Histone demethylases and cancer. Adv Cancer Res, 102:103–169, 2009. URL: http:// dx.doi.org/10.1016/S0065-230X(09)02004-1, doi:10. 1016/S0065-230X(09)02004-1. 19 [59] R. D. Kortschak, P. W. Tucker, and R. Saint. Arid proteins come in from the desert. Trends Biochem Sci, 25(6):294–299, Jun 2000. 19 [60] Ping Chi, C David Allis, and Gang Greg Wang. Covalent histone modifications–miswritten, misinterpreted and mis-erased in human cancers. Nat Rev Cancer, 10(7):457–469, Jul 2010. URL: http:// dx.doi.org/10.1038/nrc2876, doi:10.1038/nrc2876. 19, 34 [61] Michiel Vermeulen, Klaas W. Mulder, Sergei Denissov, W W M Pim Pijnappel, Frederik M A. van Schaik, Radhika A. Varier, Marijke P A. Baltissen, Henk G. Stunnenberg, Matthias Mann, and H Th Marc Timmers. Selective anchoring of tfiid to nucleosomes by trimethylation of histone h3 lysine 4. Cell, 131(1):58–69, Oct 2007. URL: http://dx. doi.org/10.1016/j.cell.2007.08.016, doi:10.1016/j. cell.2007.08.016. 19 [62] Nuria Lopez-Bigas, Tomasz A. Kisiel, Dannielle C. Dewaal, Katie B. Holmes, Tom L. Volkert, Sumeet Gupta, Jennifer Love, Heather L. Murray, Richard A. Young, and Elizaveta V. Benevolenskaya. Genome-wide analysis of the h3k4 histone 361 VII. B IBLIOGRAPHY demethylase rbp2 reveals a transcriptional program controlling differentiation. Mol Cell, 31(4):520–530, Aug 2008. URL: http://dx.doi.org/10.1016/j.molcel.2008.08.004, doi:10.1016/j.molcel.2008.08.004. 19, 26, 29, 294, 297 [63] Joel C. Eissenberg, Min Gyu Lee, Jessica Schneider, Anne Ilvarsonn, Ramin Shiekhattar, and Ali Shilatifard. The trithorax-group gene in drosophila little imaginal discs encodes a trimethylated histone h3 lys4 demethylase. Nat Struct Mol Biol, 14(4):344–346, Apr 2007. URL: http://dx.doi.org/10.1038/nsmb1217, doi: 10.1038/nsmb1217. 20, 27 [64] Christopher R. Vakoc, Sean A. Mandat, Benjamin A. Olenchock, and Gerd A. Blobel. Histone h3 lysine 9 methylation and hp1gamma are associated with transcription elongation through mammalian chromatin. Mol Cell, 19(3):381–391, Aug 2005. URL: http://dx. doi.org/10.1016/j.molcel.2005.06.011, doi:10.1016/ j.molcel.2005.06.011. 21 [65] Artem Barski, Suresh Cuddapah, Kairong Cui, Tae-Young Roh, Dustin E. Schones, Zhibin Wang, Gang Wei, Iouri Chepelev, and Keji Zhao. High-resolution profiling of histone methylations in the human genome. Cell, 129(4):823–837, May 2007. URL: http://dx. doi.org/10.1016/j.cell.2007.05.009, doi:10.1016/j. cell.2007.05.009. 21 [66] Andrew J. Bannister and Tony Kouzarides. Reversing histone methylation. Nature, 436(7054):1103–1106, Aug 2005. URL: http://dx.doi.org/10.1038/nature04048, doi: 10.1038/nature04048. 21 ˜ [67] LluAs Morey and Kristian Helin. Polycomb group protein-mediated repression of transcription. Trends Biochem Sci, 35(6):323–332, Jun 2010. URL: http://dx.doi.org/10.1016/j.tibs.2010.02.009, doi:10.1016/j.tibs.2010.02.009. 21 [68] H. Richly, L. Aloia, and L. Di Croce. Roles of the polycomb group proteins in stem cells and cancer. Cell Death Dis, 2:e204, 2011. URL: http://dx.doi.org/10.1038/cddis.2011.84, doi: 10.1038/cddis.2011.84. 22 [69] D. Defeo-Jones, P. S. Huang, R. E. Jones, K. M. Haskell, G. A. Vuocolo, M. G. Hanobik, H. E. Huber, and A. Oliff. Cloning of cdnas for cellular proteins that bind to the retinoblastoma gene product. Nature, 362 352(6332):251–254, Jul 1991. URL: http://dx.doi.org/10. 1038/352251a0, doi:10.1038/352251a0. 22 [70] Elizaveta V. Benevolenskaya, Heather L. Murray, Philip Branton, Richard A. Young, and William G Kaelin, Jr. Binding of prb to the phd protein rbp2 promotes cellular differentiation. Mol Cell, 18(6):623–635, Jun 2005. URL: http://dx.doi.org/10.1016/j.molcel. 2005.05.012, doi:10.1016/j.molcel.2005.05.012. 22, 28, 29, 36 [71] Y. W. Kim, G. A. Otterson, R. A. Kratzke, A. B. Coxon, and F. J. Kaye. Differential specificity for binding of retinoblastoma binding protein 2 to rb, p107, and tata-binding protein. Mol Cell Biol, 14(11):7256–7264, Nov 1994. 22, 23 [72] S. Mao, G. A. Neale, and R. M. Goorha. T-cell oncogene rhombotin-2 interacts with retinoblastoma-binding protein 2. Oncogene, 14(13):1531–1539, Apr 1997. URL: http://dx.doi.org/10. 1038/sj.onc.1200988, doi:10.1038/sj.onc.1200988. 22 [73] S. W. Chan and W. Hong. Retinoblastoma-binding protein 2 (rbp2) potentiates nuclear hormone receptor-mediated transcription. J Biol Chem, 276(30):28402–28412, Jul 2001. URL: http: //dx.doi.org/10.1074/jbc.M100313200, doi:10.1074/ jbc.M100313200. 22, 23 [74] Julie Secombe and Robert N. Eisenman. The function and regulation of the jarid1 family of histone h3 lysine 4 demethylases: the myc connection. Cell Cycle, 6(11):1324–1328, Jun 2007. 22, 23, 24 [75] Julie Secombe, Ling Li, Leni Carlos, and Robert N. Eisenman. The trithorax group protein lid is a trimethyl histone h3k4 demethylase required for dmyc-induced cell growth. Genes Dev, 21(5):537–551, Mar 2007. URL: http://dx.doi.org/10.1101/gad.1523007, doi:10.1101/gad.1523007. 22, 23, 27 [76] Zheng Ge, Wenjuan Li, Na Wang, Cheng Liu, Qingjun Zhu, Magnus Bjorkholm, Astrid Gruber, and Dawei Xu. Chromatin remodeling: recruitment of histone demethylase rbp2 by mad1 for transcriptional repression of a myc target gene, telomerase reverse transcriptase. FASEB J, 24(2):579–586, Feb 2010. URL: http://dx.doi.org/10. 1096/fj.09-140087, doi:10.1096/fj.09-140087. 22, 23 VII. B IBLIOGRAPHY [77] Robert Liefke, Franz Oswald, Cristobal Alvarado, Dolores Ferres-Marco, Gerhard Mittler, Patrick Rodriguez, Maria Dominguez, and Tilman 363 Borggrefe. Histone demethylase kdm5a is an integral part of the core notch-rbp-j repressor complex. Genes Dev, 24(6):590–601, Mar 2010. URL: http://dx.doi.org/10.1101/gad.563210, doi:10. 1101/gad.563210. 22, 24, 25 [78] Chris Van Oevelen, Jinhua Wang, Patrik Asp, Qin Yan, William G Kaelin, Jr, Yuval Kluger, and Brian David Dynlacht. A role for mammalian sin3 in permanent gene silencing. Mol Cell, 32(3):359–370, Nov 2008. URL: http://dx.doi.org/10.1016/j.molcel.2008. 10.015, doi:10.1016/j.molcel.2008.10.015. 22, 25, 26, 290 [79] Tomohiro Hayakawa, Yasuko Ohtani, Noriyo Hayakawa, Kaori Shinmyozu, Motoki Saito, Fuyuki Ishikawa, and Jun-ichi Nakayama. Rbp2 is an mrg15 complex component and down-regulates intragenic histone h3 lysine 4 methylation. Genes Cells, 12(6):811–826, Jun 2007. URL: http://dx.doi.org/10.1111/j.1365-2443. 2007.01089.x, doi:10.1111/j.1365-2443.2007.01089. x. 22, 24, 25, 27 [80] Luciano DiTacchio, Hiep D. Le, Christopher Vollmers, Megumi Hatori, Michael Witcher, Julie Secombe, and Satchidananda Panda. Histone lysine demethylase jarid1a activates clock-bmal1 and influences the circadian clock. Science, 333(6051):1881–1885, Sep 2011. URL: http: //dx.doi.org/10.1126/science.1206022, doi:10.1126/ science.1206022. 22, 25, 301 [81] Michael D. Cole and Marie Henriksson. 25 years of the c-myc oncogene. Semin Cancer Biol, 16(4):241, Aug 2006. URL: http: //dx.doi.org/10.1016/j.semcancer.2006.08.003, doi: 10.1016/j.semcancer.2006.08.003. 23 [82] Rebecca A. Silverstein and Karl Ekwall. Sin3: a flexible regulator of global gene expression and genome stability. Curr Genet, 47(1):1–17, Jan 2005. URL: http://dx.doi.org/10.1007/ s00294-004-0541-5, doi:10.1007/s00294-004-0541-5. 25 [83] Jesper Christensen, Karl Agger, Paul A C. Cloos, Diego Pasini, Simon Rose, Lau Sennels, Juri Rappsilber, Klaus H. Hansen, Anna Elisabetta Salcini, and Kristian Helin. Rbp2 belongs to a family of demethylases, specific for tri-and dimethylated lysine 4 on histone 3. Cell, 128(6):1063–1076, Mar 2007. URL: http://dx.doi.org/10. 364 1016/j.cell.2007.02.003, doi:10.1016/j.cell.2007. 02.003. 26, 27, 297 [84] Robert J. Klose, Qin Yan, Zuzana Tothova, Kenichi Yamane, Hediye Erdjument-Bromage, Paul Tempst, D Gary Gilliland, Yi Zhang, and William G Kaelin, Jr. The retinoblastoma binding protein rbp2 is an h3k4 demethylase. Cell, 128(5):889–900, Mar 2007. URL: http://dx. doi.org/10.1016/j.cell.2007.02.013, doi:10.1016/j. cell.2007.02.013. 27 [85] David J. Seward, Gabrielle Cubberley, Soojin Kim, Matt Schonewald, Lian Zhang, Brian Tripet, and David L. Bentley. Demethylation of trimethylated histone h3 lys4 in vivo by jarid1 jmjc proteins. Nat Struct Mol Biol, 14(3):240–242, Mar 2007. URL: http://dx.doi.org/ 10.1038/nsmb1200, doi:10.1038/nsmb1200. 27, 28 [86] Gaoyang Liang, Robert J. Klose, Kathryn E. Gardner, and Yi Zhang. Yeast jhd2p is a histone h3 lys4 trimethyl demethylase. Nat Struct Mol Biol, 14(3):243–245, Mar 2007. URL: http://dx.doi.org/10.1038/ nsmb1204, doi:10.1038/nsmb1204. 27 [87] Shengjiang Tu, Esther M M. Bulloch, Lanhao Yang, Chen Ren, Wei-Chieh Huang, Pang-Hung Hsu, Chein-Hung Chen, Chung-Lin Liao, Hui-Ming Yu, Wan-Sheng Lo, Michael A. Freitas, and Ming-Daw Tsai. Identification of histone demethylases in saccharomyces cerevisiae. J Biol Chem, 282(19):14262–14271, May 2007. URL: http: //dx.doi.org/10.1074/jbc.M609900200, doi:10.1074/ jbc.M609900200. 27 [88] Shigeki Iwase, Fei Lan, Peter Bayliss, Luis de la Torre-Ubieta, Maite Huarte, Hank H. Qi, Johnathan R. Whetstine, Azad Bonni, Thomas M. Roberts, and Yang Shi. The x-linked mental retardation gene smcx/jarid1c defines a family of histone h3 lysine 4 demethylases. Cell, 128(6):1077–1088, Mar 2007. URL: http://dx.doi.org/10. 1016/j.cell.2007.02.017, doi:10.1016/j.cell.2007. 02.017. 27, 28 [89] Min Gyu Lee, Jessica Norman, Ali Shilatifard, and Ramin Shiekhattar. Physical and functional association of a trimethyl h3k4 demethylase and ring6a/mblr, a polycomb-like protein. Cell, 128(5):877–887, Mar 2007. URL: http://dx.doi.org/10.1016/j.cell.2007.02.004, doi:10.1016/j.cell.2007.02.004. 27 VII. B IBLIOGRAPHY [90] Nara Lee, Junyu Zhang, Robert J. Klose, Hediye Erdjument-Bromage, Paul Tempst, Richard S. Jones, and Yi Zhang. The trithorax-group protein 365 lid is a histone h3 trimethyl-lys4 demethylase. Nat Struct Mol Biol, 14(4):341–343, Apr 2007. URL: http://dx.doi.org/10.1038/ nsmb1216, doi:10.1038/nsmb1216. 27 [91] Elizaveta V. Benevolenskaya. Histone h3k4 demethylases are essential in development and differentiation. Biochem Cell Biol, 85(4):435–443, Aug 2007. URL: http://dx.doi.org/10.1139/O07-057, doi: 10.1139/O07-057. 27, 29, 291 [92] J. J. Gildea, R. Lopez, and A. Shearn. A screen for new trithorax group genes identified little imaginal discs, the drosophila melanogaster homologue of human retinoblastoma binding protein 2. Genetics, 156(2):645–663, Oct 2000. 27 [93] Diego Pasini, Klaus H. Hansen, Jesper Christensen, Karl Agger, Paul A C. Cloos, and Kristian Helin. Coordinated regulation of transcriptional repression by the rbp2 h3k4 demethylase and polycomb-repressive complex 2. Genes Dev, 22(10):1345–1355, May 2008. URL: http://dx.doi.org/10.1101/gad.470008, doi:10.1101/ gad.470008. 27, 290 [94] Angela Barrett, Samantha Santangelo, Keith Tan, Steve Catchpole, Kevin Roberts, Bradley Spencer-Dene, Debbie Hall, Angelo Scibetta, Joy Burchell, Eric Verdin, Paul Freemont, and Joyce Taylor-Papadimitriou. Breast cancer associated transcriptional repressor plu-1/jarid1b interacts directly with histone deacetylases. Int J Cancer, 121(2):265–275, Jul 2007. URL: http://dx.doi.org/10.1002/ijc.22673, doi: 10.1002/ijc.22673. 27 [95] C. Quadbeck-Seeger, G. Wanner, S. Huber, R. Kahmann, and J. Kamper. A protein with similarity to the human retinoblastoma binding protein 2 acts specifically as a repressor for genes regulated by the b mating type locus in ustilago maydis. Mol Microbiol, 38(1):154–166, Oct 2000. 28 [96] Fei Lan, Amanda Clair Nottke, and Yang Shi. Mechanisms involved in the regulation of histone lysine demethylases. Curr Opin Cell Biol, 20(3):316–325, Jun 2008. URL: http://dx.doi.org/10. 1016/j.ceb.2008.03.004, doi:10.1016/j.ceb.2008.03. 004. 28 [97] W. R. Sellers, B. G. Novitch, S. Miyake, A. Heith, G. A. Otterson, F. J. Kaye, A. B. Lassar, and WG Kaelin, Jr. Stable binding to e2f is not required for the retinoblastoma protein to activate transcription, promote differentiation, and suppress tumor cell growth. Genes Dev, 12(1):95–106, Jan 1998. 28, 29, 291 366 [98] D. M. Thomas, S. A. Carty, D. M. Piscopo, J. S. Lee, W. F. Wang, W. C. Forrester, and P. W. Hinds. The retinoblastoma protein acts as a transcriptional coactivator required for osteogenic differentiation. Mol Cell, 8(2):303–316, Aug 2001. 29 [99] S. X. Skapek, Y-R. Pan, and E Y-H P. Lee. Regulation of cell lineage specification by the retinoblastoma tumor suppressor. Oncogene, 25(38):5268–5276, Aug 2006. URL: http://dx.doi.org/10. 1038/sj.onc.1209710, doi:10.1038/sj.onc.1209710. 29 [100] Gabriel M. Gutierrez, Elizabeth Kong, and Philip W. Hinds. Master or slave: the complex relationship of rbp2 and prb. Cancer Cell, 7(6):501–502, Jun 2005. URL: http://dx.doi.org/10.1016/j. ccr.2005.05.021, doi:10.1016/j.ccr.2005.05.021. 30 [101] Jose Ignacio Martin G. Sandan-Subero and Manel Esteller. Profiling epigenetic alterations in disease. Adv Exp Med Biol, 711:162–177, 2011. 29 [102] Manel Esteller. Cancer epigenomics: Dna methylomes and histone-modification maps. Nat Rev Genet, 8(4):286–298, Apr 2007. URL: http://dx.doi.org/10.1038/nrg2005, doi: 10.1038/nrg2005. 29 [103] M Matabuena de Yzaguirre, J Santos Hernandez, P Fernandez Navarro, P Lopez Nieva, M. Herranz, M. F. Fraga, M. Esteller, A. Juarranz, and J. Fernandez-Piqueras. Epigenetic silencing of e- and n-cadherins in the stroma of mouse thymic lymphomas. Carcinogenesis, 27(5):1081–1089, May 2006. URL: http://dx.doi.org/10.1093/carcin/ bgi331, doi:10.1093/carcin/bgi331. 30 [104] Mario F. Fraga and Manel Esteller. Towards the human cancer epigenome: a first draft of histone modifications. Cell Cycle, 4(10):1377–1381, Oct 2005. 30 [105] Mario F. Fraga, Esteban Ballestar, Ana Villar-Garea, Manuel Boix-Chornet, Jesus Espada, Gunnar Schotta, Tiziana Bonaldi, Claire Haydon, Santiago Ropero, Kevin Petrie, N Gopalakrishna Iyer, Alberto Perez-Rosado, Enrique Calvo, Juan A. Lopez, Amparo Cano, Maria J. Calasanz, Dolors Colomer, Miguel Angel Piris, Natalie Ahn, Axel Imhof, Carlos Caldas, Thomas Jenuwein, and Manel Esteller. Loss of acetylation at lys16 and trimethylation at lys20 of histone h4 is a common hallmark of human cancer. Nat Genet, 37(4):391–400, Apr 2005. URL: http: //dx.doi.org/10.1038/ng1531, doi:10.1038/ng1531. 30, 31, 37 367 VII. B IBLIOGRAPHY [106] Michael Weber, Jonathan J. Davies, David Wittig, Edward J. Oakeley, Michael Haase, Wan L. Lam, and Dirk Schubeler. Chromosome-wide and promoter-specific analyses identify sites of differential dna methylation in normal and transformed human cells. Nat Genet, 37(8):853–862, Aug 2005. URL: http://dx.doi.org/10.1038/ng1598, doi:10. 1038/ng1598. 30 [107] P. A. Jones and P. W. Laird. Cancer epigenetics comes of age. Nat Genet, 21(2):163–167, Feb 1999. URL: http://dx.doi.org/10.1038/ 5947, doi:10.1038/5947. 30 [108] James G. Herman and Stephen B. Baylin. Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med, 349(21):2042–2054, Nov 2003. URL: http://dx.doi.org/10. 1056/NEJMra023075, doi:10.1056/NEJMra023075. 31 [109] Kurtis E. Bachman, Ben Ho Park, Ina Rhee, Harith Rajagopalan, James G. Herman, Stephen B. Baylin, Kenneth W. Kinzler, and Bert Vogelstein. Histone modifications and silencing prior to dna methylation of a tumor suppressor gene. Cancer Cell, 3(1):89–95, Jan 2003. 31 [110] Michael Weber, Ines Hellmann, Michael B. Stadler, Liliana Ramos, Svante Paabo, Michael Rebhan, and Dirk Schuebeler. Distribution, silencing potential and evolutionary impact of promoter dna methylation in the human genome. Nat Genet, 39(4):457–466, Apr 2007. URL: http://dx.doi.org/10.1038/ng1990, doi:10.1038/ ng1990. 31 [111] J. A. Yoder, C. P. Walsh, and T. H. Bestor. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet, 13(8):335–340, Aug 1997. 31 [112] Manuel Rodriguez-Paredes and Manel Esteller. Cancer epigenetics reaches mainstream oncology. Nat Med, 17(3):330–339, Mar 2011. URL: http://dx.doi.org/10.1038/nm.2305, doi:10.1038/nm. 2305. 31, 32, 33, 39, 40 [113] Peter A. Jones and Stephen B. Baylin. The fundamental role of epigenetic events in cancer. Nat Rev Genet, 3(6):415–428, Jun 2002. URL: http: //dx.doi.org/10.1038/nrg816, doi:10.1038/nrg816. 31 [114] Manel Esteller. Cpg island hypermethylation and tumor suppressor genes: a booming present, a brighter future. Oncogene, 21(35):5427–5440, Aug 2002. URL: http://dx.doi.org/10.1038/sj.onc.1205600, doi:10.1038/sj.onc.1205600. 31 368 [115] Manel Esteller. Epigenetic gene silencing in cancer: the dna hypermethylome. Hum Mol Genet, 16 Spec No 1:R50–R59, Apr 2007. URL: http://dx.doi.org/10.1093/hmg/ddm018, doi:10. 1093/hmg/ddm018. 31 [116] Manel Esteller, Mario F. Fraga, Maria F. Paz, Elias Campo, Dolors Colomer, Francisco J. Novo, Maria J. Calasanz, Oliver Galm, Mingzhou Guo, Javier Benitez, and James G. Herman. Cancer epigenetics and methylation. Science, 297(5588):1807–8; discussion 1807–8, Sep 2002. 31 [117] P. M. Watt, R. Kumar, and U. R. Kees. Promoter demethylation accompanies reactivation of the hox11 proto-oncogene in leukemia. Genes Chromosomes Cancer, 29(4):371–377, Dec 2000. 31 [118] Manel Esteller. Epigenetics in cancer. N Engl J Med, 358(11):1148–1159, Mar 2008. URL: http://dx.doi.org/10. 1056/NEJMra072067, doi:10.1056/NEJMra072067. 31 [119] Peter A. Jones and Stephen B. Baylin. The epigenomics of cancer. Cell, 128(4):683–692, Feb 2007. URL: http://dx.doi.org/10. 1016/j.cell.2007.01.029, doi:10.1016/j.cell.2007. 01.029. 31 [120] Tony Kouzarides. Chromatin modifications and their function. Cell, 128(4):693–705, Feb 2007. URL: http://dx.doi.org/10. 1016/j.cell.2007.02.005, doi:10.1016/j.cell.2007. 02.005. 31 [121] David B. Seligson, Steve Horvath, Tao Shi, Hong Yu, Sheila Tze, Michael Grunstein, and Siavash K. Kurdistani. Global histone modification patterns predict risk of prostate cancer recurrence. Nature, 435(7046):1262–1266, Jun 2005. URL: http://dx.doi.org/10. 1038/nature03672, doi:10.1038/nature03672. 31 [122] Adrian P. Bracken, Diego Pasini, Maria Capra, Elena Prosperini, Elena Colli, and Kristian Helin. Ezh2 is downstream of the prb-e2f pathway, essential for proliferation and amplified in cancer. EMBO J, 22(20):5323–5335, Oct 2003. URL: http://dx.doi.org/10. 1093/emboj/cdg542, doi:10.1093/emboj/cdg542. 34 [123] Adrian P. Bracken and Kristian Helin. Polycomb group proteins: navigators of lineage pathways led astray in cancer. Nat Rev Cancer, 9(11):773–784, Nov 2009. URL: http://dx.doi.org/10.1038/ nrc2736, doi:10.1038/nrc2736. 34 369 VII. B IBLIOGRAPHY [124] Chunhua Lu, Hee Dong Han, Lingegowda S. Mangala, Rouba Ali-Fehmi, Christopher S. Newton, Laurent Ozbun, Guillermo N. Armaiz-Pena, Wei Hu, Rebecca L. Stone, Adnan Munkarah, Murali K. Ravoori, Mian M K. Shahzad, Jeong-Won Lee, Edna Mora, Robert R. Langley, Amy R. Carroll, Koji Matsuo, Whitney A. Spannuth, Rosemarie Schmandt, Nicholas B. Jennings, Blake W. Goodman, Robert B. Jaffe, Alpa M. Nick, Hye Sun Kim, Eylem Ozturk Guven, Ya-Huey Chen, Long-Yuan Li, Ming-Chuan Hsu, Robert L. Coleman, George A. Calin, Emir B. Denkbas, Jae Yun Lim, Ju-Seog Lee, Vikas Kundra, Michael J. Birrer, Mien-Chie Hung, Gabriel Lopez-Berestein, and Anil K. Sood. Regulation of tumor angiogenesis by ezh2. Cancer Cell, 18(2):185–197, Aug 2010. URL: http://dx.doi.org/10.1016/j.ccr.2010.06. 016, doi:10.1016/j.ccr.2010.06.016. 34 [125] Celina G. Kleer, Qi Cao, Sooryanarayana Varambally, Ronglai Shen, Ichiro Ota, Scott A. Tomlins, Debashis Ghosh, Richard G A B. Sewalt, Arie P. Otte, Daniel F. Hayes, Michael S. Sabel, Donna Livant, Stephen J. Weiss, Mark A. Rubin, and Arul M. Chinnaiyan. Ezh2 is a marker of aggressive breast cancer and promotes neoplastic transformation of breast epithelial cells. Proc Natl Acad Sci U S A, 100(20):11606–11611, Sep 2003. URL: http://dx.doi.org/10. 1073/pnas.1933744100, doi:10.1073/pnas.1933744100. 34 [126] Karin Collett, Geir E. Eide, Jarle Arnes, Ingunn M. Stefansson, Johan Eide, Audun Braaten, Turid Aas, Ariel P. Otte, and Lars A. Akslen. Expression of enhancer of zeste homologue 2 is significantly associated with increased tumor cell proliferation and is a marker of aggressive breast cancer. Clin Cancer Res, 12(4):1168–1174, Feb 2006. URL: http:// dx.doi.org/10.1158/1078-0432.CCR-05-1533, doi:10. 1158/1078-0432.CCR-05-1533. 34 [127] Steffen Weikert, Frank Christoph, Jens Koche, R. P.llermann, Markus ˜1 MA 4 ller, Mark Schrader, Kurt Miller, and Hans Krause. Expression levels of the ezh2 polycomb transcriptional repressor correlate with aggressiveness and invasive potential of bladder carcinomas. Int J Mol Med, 16(2):349–353, Aug 2005. 34 [128] Geert J L H. van Leenders, Danny Dukers, Daphne Hessels, Susan W M. van den Kieboom, Christina A. Hulsbergen, J Alfred Witjes, Arie P. Otte, Chris J. Meijer, and Frank M. Raaphorst. Polycomb-group oncogenes ezh2, bmi1, and ring1 are overexpressed in prostate cancer with adverse pathologic and clinical features. Eur Urol, 52(2):455–463, 370 Aug 2007. URL: http://dx.doi.org/10.1016/j.eururo. 2006.11.020, doi:10.1016/j.eururo.2006.11.020. 34 [129] Sooryanarayana Varambally, Saravana M. Dhanasekaran, Ming Zhou, Terrence R. Barrette, Chandan Kumar-Sinha, Martin G. Sanda, Debashis Ghosh, Kenneth J. Pienta, Richard G A B. Sewalt, Arie P. Otte, Mark A. Rubin, and Arul M. Chinnaiyan. The polycomb group protein ezh2 is involved in progression of prostate cancer. Nature, 419(6907):624–629, Oct 2002. URL: http://dx.doi.org/10.1038/nature01075, doi:10.1038/nature01075. 34 [130] Ryan D. Morin, Nathalie A. Johnson, Tesa M. Severson, Andrew J. Mungall, Jianghong An, Rodrigo Goya, Jessica E. Paul, Merrill Boyle, Bruce W. Woolcock, Florian Kuchenbauer, Damian Yap, R Keith Humphries, Obi L. Griffith, Sohrab Shah, Henry Zhu, Michelle Kimbara, Pavel Shashkin, Jean F. Charlot, Marianna Tcherpakov, Richard Corbett, Angela Tam, Richard Varhol, Duane Smailus, Michelle Moksa, Yongjun Zhao, Allen Delaney, Hong Qian, Inanc Birol, Jacqueline Schein, Richard Moore, Robert Holt, Doug E. Horsman, Joseph M. Connors, Steven Jones, Samuel Aparicio, Martin Hirst, Randy D. Gascoyne, and Marco A. Marra. Somatic mutations altering ezh2 (tyr641) in follicular and diffuse large b-cell lymphomas of germinal-center origin. Nat Genet, 42(2):181–185, Feb 2010. URL: http://dx.doi.org/10.1038/ng.518, doi: 10.1038/ng.518. 34 [131] Ryuji Hamamoto, Fabio Pittella Silva, Masataka Tsuge, Toshihiko Nishidate, Toyomasa Katagiri, Yusuke Nakamura, and Yoichi Furukawa. Enhanced smyd3 expression is essential for the growth of breast cancer cells. Cancer Sci, 97(2):113–118, Feb 2006. URL: http://dx.doi. org/10.1111/j.1349-7006.2006.00146.x, doi:10.1111/ j.1349-7006.2006.00146.x. 34 [132] Ryuji Hamamoto, Yoichi Furukawa, Masashi Morita, Yuko Iimura, Fabio Pittella Silva, Meihua Li, Ryuichiro Yagyu, and Yusuke Nakamura. Smyd3 encodes a histone methyltransferase involved in the proliferation of cancer cells. Nat Cell Biol, 6(8):731–740, Aug 2004. URL: http:// dx.doi.org/10.1038/ncb1151, doi:10.1038/ncb1151. 34 [133] Andrei V. Krivtsov and Scott A. Armstrong. Mll translocations, histone modifications and leukaemia stem-cell development. Nat Rev Cancer, 7(11):823–833, Nov 2007. URL: http://dx.doi.org/10.1038/ nrc2253, doi:10.1038/nrc2253. 34 371 VII. B IBLIOGRAPHY [134] Zejuan Li, Roger T. Luo, Shuangli Mi, Miao Sun, Ping Chen, Jingyue Bao, Mary Beth Neilly, Nimanthi Jayathilaka, Deborah S. Johnson, Lili Wang, Catherine Lavau, Yanming Zhang, Charles Tseng, Xiuqing Zhang, Jian Wang, Jun Yu, Huanming Yang, San Ming Wang, Janet D. Rowley, Jianjun Chen, and Michael J. Thirman. Consistent deregulation of gene expression between human and murine mll rearrangement leukemias. Cancer Res, 69(3):1109–1116, Feb 2009. URL: http:// dx.doi.org/10.1158/0008-5472.CAN-08-3381, doi:10. 1158/0008-5472.CAN-08-3381. 34 [135] Jing Huang, Roopsha Sengupta, Alexsandra B. Espejo, Min Gyu Lee, Jean A. Dorsey, Mario Richter, Susanne Opravil, Ramin Shiekhattar, Mark T. Bedford, Thomas Jenuwein, and Shelley L. Berger. p53 is regulated by the lysine demethylase lsd1. Nature, 449(7158):105–108, Sep 2007. URL: http://dx.doi.org/10.1038/nature06092, doi:10.1038/nature06092. 36 [136] Philip Kahl, Lucia Gullotti, Lukas Carl Heukamp, Susanne Wolf, Nicolaus Friedrichs, Roland Vorreuther, Gerold Solleder, Patrick J. Bastian, Jorg Ellinger, Eric Metzger, Roland Schubeler, D.le, and Reinhard Buettner. Androgen receptor coactivators lysine-specific histone demethylase 1 and four and a half lim domain protein 2 predict risk of prostate cancer recurrence. Cancer Res, 66(23):11341–11347, Dec 2006. URL: http://dx.doi.org/10.1158/0008-5472. CAN-06-1570, doi:10.1158/0008-5472.CAN-06-1570. 36 [137] Johannes H. Schulte, Soyoung Lim, Alexander Schramm, Nicolaus Friedrichs, Jan Koster, Rogier Versteeg, Ingrid Ora, Kristian Pajtler, Ludger Klein-Hitpass, Steffi Kuhfittig-Kulle, Eric Metzger, Roland Schule, Angelika Eggert, Reinhard Buettner, and Jutta Kirfel. Lysine-specific demethylase 1 is strongly expressed in poorly differentiated neuroblastoma: implications for therapy. Cancer Res, 69(5):2065–2071, Mar 2009. URL: http://dx.doi.org/10.1158/0008-5472.CAN-08-1735, doi:10.1158/0008-5472.CAN-08-1735. 36 [138] Soyoung Lim, Andreas Janzer, Astrid Becker, Andreas Zimmer, Roland Schule, Reinhard Buettner, and Jutta Kirfel. Lysine-specific demethylase 1 (lsd1) is highly expressed in er-negative breast cancers and a biomarker predicting aggressive biology. Carcinogenesis, 31(3):512–520, Mar 2010. URL: http://dx.doi.org/10.1093/carcin/bgp324, doi: 10.1093/carcin/bgp324. 36 372 [139] Laura J C M. van Zutven, Emine Onen, Sandra C J M. Velthuizen, Ellen van Drunen, Anne R M. von Bergh, Marry M. van den Heuvel-Eibrink, Angelo Veronese, Cristina Mecucci, Massimo Negrini, Georgine E. de Greef, and H Berna Beverloo. Identification of nup98 abnormalities in acute leukemia: Jarid1a (12p13) as a new partner gene. Genes Chromosomes Cancer, 45(5):437–446, May 2006. URL: http://dx.doi.org/10.1002/gcc.20308, doi: 10.1002/gcc.20308. 36 [140] S. C. Raimondi. Current status of cytogenetic research in childhood acute lymphoblastic leukemia. Blood, 81(9):2237–2251, May 1993. 36 [141] R. Berger, A. Bernheim, M. Le Coniat, D. Vecchione, A. Pacot, M. T. Daniel, and G. Flandrin. Abnormalities of the short arm of chromosome 12 in acute nonlymphocytic leukemia and dysmyelopoietic syndrome. Cancer Genet Cytogenet, 19(3-4):281–289, Jan 1986. 36 [142] C-H. Pui, J. M. Chessells, B. Camitta, A. Baruchel, A. Biondi, J. M. Boyett, A. Carroll, O. B. Eden, W. E. Evans, H. Gadner, J. Harbott, D. O. Harms, C. J. Harrison, P. L. Harrison, N. Heerema, G. Janka-Schaub, W. Kamps, G. Masera, J. Pullen, S. C. Raimondi, S. Richards, H. Riehm, S. Sallan, H. Sather, J. Shuster, L. B. Silverman, M. G. Valsecchi, E. Vilmer, Y. Zhou, P. S. Gaynon, and M. Schrappe. Clinical heterogeneity in childhood acute lymphoblastic leukemia with 11q23 rearrangements. Leukemia, 17(4):700–706, Apr 2003. URL: http:// dx.doi.org/10.1038/sj.leu.2402883, doi:10.1038/sj. leu.2402883. 36 [143] B. Streubel, C. Sauerland, G. Heil, M. Freund, H. Bartels, E. Lengfelder, H. Wandt, W. D. Ludwig, H. Nowotny, M. Baldus, B. Grothaus-Pinke, T. Buchner, and C. Fonatsch. Correlation of cytogenetic, molecular cytogenetic, and clinical findings in 59 patients with anll or mds and abnormalities of the short arm of chromosome 12. Br J Haematol, 100(3):521–533, Mar 1998. 36, 37 [144] Susana C. Raimondi, Yinmei Zhou, Susan Mathew, Sheila A. Shurtleff, John T. Sandlund, Gaston K. Rivera, Frederick G. Behm, and Ching-Hon Pui. Reassessment of the prognostic significance of hypodiploidy in pediatric patients with acute lymphoblastic leukemia. Cancer, 98(12):2715–2722, Dec 2003. URL: http://dx.doi.org/10. 1002/cncr.11841, doi:10.1002/cncr.11841. 36 VII. B IBLIOGRAPHY [145] M. Jarosova, K. Indrak, M. Holzerova, J. Hubacek, E. Faber, T. Papajik, L. Raida, T. Szotkowski, R. Knotkova, T. Hlusi, K. Jedlickova, 373 Z. Pikalova, and I. Sulovska. [prognostic significance of cytogenetic changes in patients with acute myeloid leukemia (aml). (analysis of results in 105 patients treated at the hemato-oncology clinic of the university hospital in olomouc from 1997 to 2000]. Vnitr Lek, 47 Suppl 1:8–14, Sep 2001. 36 [146] M. L. Larramendy, T. Huhta, K. Heinonen, K. Vettenranta, E. Mahlamaki, P. Riikonen, U. M. Saarinen-Pihkala, and S. Knuutila. Dna copy number changes in childhood acute lymphoblastic leukemia. Haematologica, 83(10):890–895, Oct 1998. 36 [147] M. L. Larramendy, T. Huhta, K. Vettenranta, W. El-Rifai, J. Lundin, S. Pakkala, U. M. Saarinen-Pihkala, and S. Knuutila. Comparative genomic hybridization in childhood acute lymphoblastic leukemia. Leukemia, 12(10):1638–1644, Oct 1998. 36 [148] Soyoung Lim, Eric Metzger, Roland Schule, Jutta Kirfel, and Reinhard Buettner. Epigenetic regulation of cancer growth by histone demethylases. Int J Cancer, 127(9):1991–1998, Nov 2010. URL: http://dx.doi. org/10.1002/ijc.25538, doi:10.1002/ijc.25538. 37 [149] Angela Barrett, Bente Madsen, John Copier, Pei Juan Lu, Lucienne Cooper, Angelo G. Scibetta, Joy Burchell, and Joyce Taylor-Papadimitriou. Plu-1 nuclear protein, which is upregulated in breast cancer, shows restricted expression in normal human adult tissues: a new cancer/testis antigen? Int J Cancer, 101(6):581–588, Oct 2002. URL: http://dx.doi.org/10.1002/ijc.10644, doi:10.1002/ijc.10644. 37 [150] P. J. Lu, K. Sundquist, D. Baeckstrom, R. Poulsom, A. Hanby, S. Meier-Ewert, T. Jones, M. Mitchell, P. Pitha-Rowe, P. Freemont, and J. Taylor-Papadimitriou. A novel gene (plu-1) containing highly conserved putative dna/chromatin binding motifs is specifically up-regulated in breast cancer. J Biol Chem, 274(22):15633–15645, May 1999. 37 [151] Kenichi Yamane, Keisuke Tateishi, Robert J. Klose, Jia Fang, Laura A. Fabrizio, Hediye Erdjument-Bromage, Joyce Taylor-Papadimitriou, Paul Tempst, and Yi Zhang. Plu-1 is an h3k4 demethylase involved in transcriptional repression and breast cancer cell proliferation. Mol Cell, 25(6):801–812, Mar 2007. URL: http://dx.doi.org/10.1016/j.molcel.2007.03.001, doi:10.1016/j.molcel.2007.03.001. 37, 41 374 [152] Siegmund S. Wolf, Vladimir K. Patchev, and Maik Obendorf. A novel variant of the putative demethylase gene, s-jmjd1c, is a coactivator of the ar. Arch Biochem Biophys, 460(1):56–66, Apr 2007. URL: http:// dx.doi.org/10.1016/j.abb.2007.01.017, doi:10.1016/ j.abb.2007.01.017. 37 [153] Z. Hu, I. Gomes, S. K. Horrigan, J. Kravarusic, B. Mar, Z. Arbieva, B. Chyna, N. Fulton, S. Edassery, A. Raza, and C. A. Westbrook. A novel nuclear protein, 5qnca (loc51780) is a candidate for the myeloid leukemia tumor suppressor gene on chromosome 5 band q31. Oncogene, 20(47):6946–6954, Oct 2001. URL: http://dx.doi.org/10. 1038/sj.onc.1204850, doi:10.1038/sj.onc.1204850. 37 [154] P.A. Cloos, J. Christensen, K. Agger, A. Maiolica, J. Rappsilber, T. Antal, K.H. Hansen, and K. Helin. The putative oncogene gasc1 demethylates tri- and dimethylated lysine 9 on histone h3. Nature, 442:307–311, 2006. 37 [155] Gijs van Haaften, Gillian L. Dalgliesh, Helen Davies, Lina Chen, Graham Bignell, Chris Greenman, Sarah Edkins, Claire Hardy, Sarah O’Meara, Jon Teague, Adam Butler, Jonathan Hinton, Calli Latimer, Jenny Andrews, Syd Barthorpe, Dave Beare, Gemma Buck, Peter J. Campbell, Jennifer Cole, Simon Forbes, Mingming Jia, David Jones, Chai Yin Kok, Catherine Leroy, Meng-Lay Lin, David J. McBride, Mark Maddison, Simon Maquire, Kirsten McLay, Andrew Menzies, Tatiana Mironenko, Lee Mulderrig, Laura Mudie, Erin Pleasance, Rebecca Shepherd, Raffaella Smith, Lucy Stebbings, Philip Stephens, Gurpreet Tang, Patrick S. Tarpey, Rachel Turner, Kelly Turrell, Jennifer Varian, Sofie West, Sara Widaa, Paul Wray, V Peter Collins, Koichi Ichimura, Simon Law, John Wong, Siu Tsan Yuen, Suet Yi Leung, Giovanni Tonon, Ronald A. DePinho, Yu-Tzu Tai, Kenneth C. Anderson, Richard J. Kahnoski, Aaron Massie, Sok Kean Khoo, Bin Tean Teh, Michael R. Stratton, and P Andrew Futreal. Somatic mutations of the histone h3k27 demethylase gene utx in human cancer. Nat Genet, 41(5):521–523, May 2009. URL: http://dx.doi.org/10.1038/ng.349, doi: 10.1038/ng.349. 37 [156] Jordon K. Wang, Miao-Chih Tsai, Gino Poulin, Adam S. Adler, Shuzhen Chen, Helen Liu, Yang Shi, and Howard Y. Chang. The histone demethylase utx enables rb-dependent cell fate control. Genes Dev, 24(4):327–332, Feb 2010. URL: http://dx.doi.org/10.1101/ gad.1882610, doi:10.1101/gad.1882610. 37 375 VII. B IBLIOGRAPHY [157] N. D. Nigro, V. K. Vaitkevicius, and B Considine, Jr. Dynamic management of squamous cell cancer of the anal canal. Invest New Drugs, 7(1):83–89, Apr 1989. 37 [158] Lin He and Gregory J. Hannon. Micrornas: small rnas with a big role in gene regulation. Nat Rev Genet, 5(7):522–531, Jul 2004. URL: http:// dx.doi.org/10.1038/nrg1379, doi:10.1038/nrg1379. 38 [159] Inhan Lee, Subramanian S. Ajay, Jong In Yook, Hyun Sil Kim, Su Hyung Hong, Nam Hee Kim, Saravana M. Dhanasekaran, Arul M. Chinnaiyan, and Brian D. Athey. New class of microrna targets containing simultaneous 5’-utr and 3’-utr interaction sites. Genome Res, 19(7):1175–1183, Jul 2009. URL: http://dx.doi.org/10.1101/ gr.089367.108, doi:10.1101/gr.089367.108. 38 [160] Nicola Valeri, Ivan Vannini, Francesca Fanini, Federica Calore, Brett Adair, and Muller Fabbri. Epigenetics, mirnas, and human cancer: a new chapter in human gene regulation. Mamm Genome, 20(9-10):573–580, 2009. URL: http://dx.doi.org/10.1007/ s00335-009-9206-5, doi:10.1007/s00335-009-9206-5. 38 [161] Francesco Fazi, Serena Racanicchi, Giuseppe Zardo, Linda M. Starnes, Marco Mancini, Lorena Travaglini, Daniela Diverio, Emanuele Ammatuna, Giuseppe Cimino, Francesco Lo-Coco, Francesco Grignani, and Clara Nervi. Epigenetic silencing of the myelopoiesis regulator microrna-223 by the aml1/eto oncoprotein. Cancer Cell, 12(5):457–466, Nov 2007. URL: http://dx.doi.org/10.1016/j.ccr.2007. 09.020, doi:10.1016/j.ccr.2007.09.020. 38 [162] George Adrian Calin, Calin Dan Dumitru, Masayoshi Shimizu, Roberta Bichi, Simona Zupo, Evan Noch, Hansjuerg Aldler, Sashi Rattan, Michael Keating, Kanti Rai, Laura Rassenti, Thomas Kipps, Massimo Negrini, Florencia Bullrich, and Carlo M. Croce. Frequent deletions and down-regulation of micro- rna genes mir15 and mir16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A, 99(24):15524–15529, Nov 2002. URL: http://dx.doi.org/10. 1073/pnas.242606799, doi:10.1073/pnas.242606799. 38 [163] Maria Hatziapostolou and Dimitrios Iliopoulos. Epigenetic aberrations during oncogenesis. Cell Mol Life Sci, 68(10):1681–1702, May 2011. URL: http://dx.doi.org/10.1007/s00018-010-0624-z, doi:10.1007/s00018-010-0624-z. 38, 39 376 [164] Jose Roman-Gomez, Xabier Agirre, Antonio Jimenez-Velasco, Victor Arqueros, Amaia Vilas-Zornoza, Paula Rodriguez-Otero, Inaki Martin-Subero, Leire Garate, Lucia Cordeu, Edurne San Jose-Eneriz, Vanesa Martin, Juan Antonio Castillejo, Eva Bandres, Maria J.a Jose Calasanz, Reiner Siebert, Anabel Heiniger, Antonio Torres, and Felipe Prosper. Epigenetic regulation of micrornas in acute lymphoblastic leukemia. J Clin Oncol, 27(8):1316–1322, Mar 2009. URL: http://dx.doi.org/10.1200/JCO.2008.19.3441, doi:10.1200/JCO.2008.19.3441. 38 [165] Sandrine M. Geranton. Targeting epigenetic mechanisms for pain relief. Curr Opin Pharmacol, Nov 2011. URL: http://dx. doi.org/10.1016/j.coph.2011.10.012, doi:10.1016/j. coph.2011.10.012. 39 [166] Joseph Huang, Christoph Plass, and Clarissa Gerhauser. Cancer chemoprevention by targeting the epigenome. Curr Drug Targets, 12(13):1925–1956, Dec 2011. 39 [167] Leigh Ellis, Peter W. Atadja, and Ricky W. Johnstone. Epigenetics in cancer: targeting chromatin modifications. Mol Cancer Ther, 8(6):1409–1420, Jun 2009. URL: http: //dx.doi.org/10.1158/1535-7163.MCT-08-0860, doi:10.1158/1535-7163.MCT-08-0860. 39 [168] Jean-Pierre J. Issa. Dna methylation as a therapeutic target in cancer. Clin Cancer Res, 13(6):1634–1637, Mar 2007. URL: http:// dx.doi.org/10.1158/1078-0432.CCR-06-2076, doi:10. 1158/1078-0432.CCR-06-2076. 39 [169] Jessica E. Bolden, Melissa J. Peart, and Ricky W. Johnstone. Anticancer activities of histone deacetylase inhibitors. Nat Rev Drug Discov, 5(9):769–784, Sep 2006. URL: http://dx.doi.org/10.1038/ nrd2133, doi:10.1038/nrd2133. 39 [170] E. E. Cameron, K. E. Bachman, S. Myohanen, J. G. Herman, and S. B. Baylin. Synergy of demethylation and histone deacetylase inhibition in the re-expression of genes silenced in cancer. Nat Genet, 21(1):103–107, Jan 1999. URL: http://dx.doi.org/10.1038/5047, doi:10. 1038/5047. 39 [171] Dorothea Greiner, Tiziana Bonaldi, Ragnhild Eskeland, Ernst Roemer, and Axel Imhof. Identification of a specific inhibitor of the histone methyltransferase su(var)3-9. Nat Chem Biol, 1(3):143–145, Aug 2005. 377 VII. B IBLIOGRAPHY URL: http://dx.doi.org/10.1038/nchembio721, doi:10. 1038/nchembio721. 40 [172] Jing Tan, Xiaojing Yang, Li Zhuang, Xia Jiang, Wei Chen, Puay Leng Lee, R K Murthy Karuturi, Patrick Boon Ooi Tan, Edison T. Liu, and Qiang Yu. Pharmacologic disruption of polycomb-repressive complex 2-mediated gene repression selectively induces apoptosis in cancer cells. Genes Dev, 21(9):1050–1063, May 2007. URL: http://dx.doi.org/10. 1101/gad.1524107, doi:10.1101/gad.1524107. 40 [173] Sreenath V. Sharma, Diana Y. Lee, Bihua Li, Margaret P. Quinlan, Fumiyuki Takahashi, Shyamala Maheswaran, Ultan McDermott, Nancy Azizian, Lee Zou, Michael A. Fischbach, Kwok-Kin Wong, Kathleyn Brandstetter, Ben Wittner, Sridhar Ramaswamy, Marie Classon, and Jeff Settleman. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell, 141(1):69–80, Apr 2010. URL: http://dx. doi.org/10.1016/j.cell.2010.02.027, doi:10.1016/j. cell.2010.02.027. 40, 301 [174] Wenchu Lin, Jian Cao, Jiayun Liu, Michael L. Beshiri, Yuko Fujiwara, Joshua Francis, Andrew D. Cherniack, Christoph Geisen, Lauren P. Blair, Mike R. Zou, Xiaohua Shen, Dan Kawamori, Zongzhi Liu, Chiara Grisanzio, Hideo Watanabe, Yoji Andrew Minamishima, Qing Zhang, Rohit N. Kulkarni, Sabina Signoretti, Scott J. Rodig, Roderick T. Bronson, Stuart H. Orkin, David P. Tuck, Elizaveta V. Benevolenskaya, Matthew Meyerson, William G Kaelin, Jr, and Qin Yan. Loss of the retinoblastoma binding protein 2 (rbp2) histone demethylase suppresses tumorigenesis in mice lacking rb1 or men1. Proc Natl Acad Sci U S A, 108(33):13379–13386, Aug 2011. URL: http://dx.doi.org/10. 1073/pnas.1110104108, doi:10.1073/pnas.1110104108. 40 [175] Leslie J. Saucedo and Bruce A. Edgar. Filling out the hippo pathway. Nat Rev Mol Cell Biol, 8(8):613–621, Aug 2007. URL: http://dx.doi. org/10.1038/nrm2221, doi:10.1038/nrm2221. 41, 43 [176] Bin Zhao, Karen Tumaneng, and Kun-Liang Guan. The hippo pathway in organ size control, tissue regeneration and stem cell self-renewal. Nat Cell Biol, 13(8):877–883, Aug 2011. URL: http://dx.doi.org/ 10.1038/ncb2303, doi:10.1038/ncb2303. 41, 43, 45 [177] Felix A. Grusche, Joffrey L. Degoutin, Helena E. Richardson, and Kieran F. Harvey. The salvador/warts/hippo pathway controls regenerative tissue growth in drosophila melanogaster. Dev Biol, 350(2):255–266, Feb 378 2011. URL: http://dx.doi.org/10.1016/j.ydbio.2010. 11.020, doi:10.1016/j.ydbio.2010.11.020. 41 [178] Sander van den Heuvel and Nicholas J. Dyson. Conserved functions of the prb and e2f families. Nat Rev Mol Cell Biol, 9(9):713–724, Sep 2008. URL: http://dx.doi.org/10.1038/nrm2469, doi: 10.1038/nrm2469. 41, 42, 44, 295 [179] Gabriel M. Gordon and Wei Du. Conserved rb functions in development and tumor suppression. Protein Cell, 2(11):864–878, Nov 2011. URL: http://dx.doi.org/10.1007/s13238-011-1117-z, doi: 10.1007/s13238-011-1117-z. 41 [180] Bin Zhao, Li Li, Qunying Lei, and Kun-Liang Guan. The hippo-yap pathway in organ size control and tumorigenesis: an updated version. Genes Dev, 24(9):862–874, May 2010. URL: http://dx.doi.org/ 10.1101/gad.1909210, doi:10.1101/gad.1909210. 42 [181] Madhuri Kango-Singh and Amit Singh. Regulation of organ size: insights from the drosophila hippo signaling pathway. Dev Dyn, 238(7):1627–1637, Jul 2009. URL: http://dx.doi.org/10. 1002/dvdy.21996, doi:10.1002/dvdy.21996. 42 [182] Georg Halder and Randy L. Johnson. Hippo signaling: growth control and beyond. Development, 138(1):9–22, Jan 2011. URL: http://dx.doi.org/10.1242/dev.045500, doi:10.1242/ dev.045500. 42 [183] Lars Zender, Mona S. Spector, Wen Xue, Peer Flemming, Carlos Cordon-Cardo, John Silke, Sheung-Tat Fan, John M. Luk, Michael Wigler, Gregory J. Hannon, David Mu, Robert Lucito, Scott Powers, and Scott W. Lowe. Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell, 125(7):1253–1267, Jun 2006. URL: http://dx.doi.org/10.1016/j.cell.2006. 05.030, doi:10.1016/j.cell.2006.05.030. 42 [184] Erik S. Knudsen and Karen E. Knudsen. Retinoblastoma tumor suppressor: where cancer meets the cell cycle. Exp Biol Med (Maywood), 231(7):1271–1281, Jul 2006. 42 [185] Olivier Stevaux, Dessislava Dimova, Maxim V. Frolov, Barbie Taylor-Harding, Erick Morris, and Nicholas Dyson. Distinct mechanisms of e2f regulation by drosophila rbf1 and rbf2. EMBO J, 21(18):4927–4937, Sep 2002. 42 379 VII. B IBLIOGRAPHY [186] Olivier Stevaux and Nicholas J. Dyson. A revised picture of the e2f transcriptional network and rb function. Curr Opin Cell Biol, 14(6):684–691, Dec 2002. 42 [187] Maxim V. Frolov and Nicholas J. Dyson. Molecular mechanisms of e2f-dependent activation and prb-mediated repression. J Cell Sci, 117(Pt 11):2173–2181, May 2004. URL: http://dx.doi.org/10.1242/ jcs.01227, doi:10.1242/jcs.01227. 42 [188] Mengxin Yin and Lei Zhang. Hippo signaling: a hub of growth control, tumor suppression and pluripotency maintenance. J Genet Genomics, 38(10):471–481, Oct 2011. URL: http://dx.doi.org/ 10.1016/j.jgg.2011.09.009, doi:10.1016/j.jgg.2011. 09.009. 43 [189] Duojia Pan. Hippo signaling in organ size control. Genes Dev, 21(8):886–897, Apr 2007. URL: http://dx.doi.org/10.1101/ gad.1536007, doi:10.1101/gad.1536007. 43 [190] Caroline Badouel, Laura Gardano, Nancy Amin, Ankush Garg, Robyn Rosenfeld, Thierry Le Bihan, and Helen McNeill. The ferm-domain protein expanded regulates hippo pathway activity via direct interactions with the transcriptional activator yorkie. Dev Cell, 16(3):411–420, Mar 2009. URL: http://dx.doi.org/10.1016/j.devcel.2009. 01.010, doi:10.1016/j.devcel.2009.01.010. 43 [191] Yijun Bao, Yutaka Hata, Mitsunobu Ikeda, and Kanchanamala Withanage. Mammalian hippo pathway: from development to cancer and beyond. J Biochem, 149(4):361–379, Apr 2011. URL: http://dx.doi.org/ 10.1093/jb/mvr021, doi:10.1093/jb/mvr021. 43 [192] V de Mello-Coelho and K. L. Hess. A conceptual and practical overview of cdna microarray technology: implications for basic and clinical sciences. Braz J Med Biol Res, 38(10):1543–1552, Oct 2005. URL: http://dx.doi.org//S0100-879X2005001000011, doi:/S0100-879X2005001000011. 47 [193] Fumiaki Katagiri and Jane Glazebrook. Overview of mrna expression profiling using dna microarrays. Curr Protoc Mol Biol, Chapter 22:Unit 22.4, Jan 2009. URL: http://dx.doi.org/10. 1002/0471142727.mb2204s85, doi:10.1002/0471142727. mb2204s85. 48 [194] Rafael A. Irizarry, Bridget Hobbs, Francois Collin, Yasmin D. Beazer-Barclay, Kristen J. Antonellis, Uwe Scherf, and Terence P. 380 Speed. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4(2):249–264, Apr 2003. URL: http://dx.doi.org/10.1093/biostatistics/ 4.2.249, doi:10.1093/biostatistics/4.2.249. 48, 50 [195] Zhijin Wu and Rafael A. Irizarry. Preprocessing of oligonucleotide array data. Nat Biotechnol, 22(6):656–8; author reply 658, Jun 2004. URL: http://dx.doi.org/10.1038/nbt0604-656b, doi:10.1038/nbt0604-656b. 48 [196] Jin Hwan Do and Dong-Kug Choi. Normalization of microarray data: single-labeled and dual-labeled arrays. Mol Cells, 22(3):254–261, Dec 2006. 49, 50 [197] E. E. Schadt, C. Li, B. Ellis, and W. H. Wong. Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J Cell Biochem Suppl, Suppl 37:120–125, 2001. 49 [198] Cheng Li. Automating dchip: toward reproducible sharing of microarray data analysis. BMC Bioinformatics, 9:231, 2008. URL: http://dx.doi.org/10.1186/1471-2105-9-231, doi:10.1186/1471-2105-9-231. 49 [199] David Edwards. Non-linear normalization and background correction in one-channel cdna microarray studies. Bioinformatics, 19(7):825–833, May 2003. 49, 51 [200] B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185–193, Jan 2003. 49, 54 [201] Wolfgang Huber, Anja von Heydebreck, Holger Sultmann, Annemarie Poustka, and Martin Vingron. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18 Suppl 1:S96–104, 2002. 49, 59 [202] E. S. Motakis, G. P. Nason, P. Fryzlewicz, and G. A. Rutter. Variance stabilization and normalization for one-color microarray data using a data-driven multiscale approach. Bioinformatics, 22(20):2547–2553, Oct 2006. URL: http: //dx.doi.org/10.1093/bioinformatics/btl412, doi:10.1093/bioinformatics/btl412. 49 381 VII. B IBLIOGRAPHY [203] Simon M. Lin, Pan Du, Wolfgang Huber, and Warren A. Kibbe. Model-based variance-stabilizing transformation for illumina microarray data. Nucleic Acids Res, 36(2):e11, Feb 2008. URL: http: //dx.doi.org/10.1093/nar/gkm1075, doi:10.1093/nar/ gkm1075. 49 [204] C. Li and W. H. Wong. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A, 98(1):31–36, Jan 2001. URL: http://dx.doi.org/10.1073/ pnas.011404098, doi:10.1073/pnas.011404098. 50 [205] A. B. Goryachev, P. F. Macgregor, and A. M. Edwards. Unfolding of microarray data. J Comput Biol, 8(4):443–461, 2001. URL: http://dx.doi.org/10.1089/106652701752236232, doi: 10.1089/106652701752236232. 51 [206] Taesung Park, Sung-Gon Yi, Sung-Hyun Kang, SeungYeoun Lee, Yong-Sung Lee, and Richard Simon. Evaluation of normalization methods for microarray data. BMC Bioinformatics, 4:33, Sep 2003. URL: http://dx.doi.org/10.1186/1471-2105-4-33, doi:10. 1186/1471-2105-4-33. 51 [207] Yee Hwa Yang, Sandrine Dudoit, Percy Luu, David M. Lin, Vivian Peng, John Ngai, and Terence P. Speed. Normalization for cdna microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res, 30(4):e15, Feb 2002. 51, 52, 53 [208] G. C. Tseng, M. K. Oh, L. Rohlin, J. C. Liao, and W. H. Wong. Issues in cdna microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res, 29(12):2549–2557, Jun 2001. 51 [209] John Quackenbush. Microarray data normalization and transformation. Nat Genet, 32 Suppl:496–501, Dec 2002. URL: http://dx.doi. org/10.1038/ng1032, doi:10.1038/ng1032. 52 [210] Thomas B. Kepler, Lynn Crosby, and Kevin T. Morgan. Normalization and analysis of dna microarray data by self-consistency and local regression. Genome Biol, 3(7):RESEARCH0037, Jun 2002. 53 [211] Christopher Workman, Lars Juhl Jensen, Hanne Jarmer, Randy Berka, ˜ Laurent Gautier, Henrik BjA¸rn Nielser, Hans-Henrik Saxild, Claus ˜ Nielsen, SA¸ren Brunak, and Steen Knudsen. A new non-linear normalization method for reducing variability in dna microarray experiments. Genome Biol, 3(9):research0048, Aug 2002. 53 382 [212] Yue Wang, Jianping Lu, Richard Lee, Zhiping Gu, and Robert Clarke. Iterative normalization of cdna microarray data. IEEE Trans Inf Technol Biomed, 6(1):29–37, Mar 2002. 53 [213] Yi-Ju Chen, Ralph Kodell, Frank Sistare, Karol L. Thompson, Suzanne Morris, and James J. Chen. Normalization methods for analysis of microarray gene-expression data. J Biopharm Stat, 13(1):57–74, Feb 2003. URL: http://dx.doi.org/10.1081/BIP-120017726, doi:10.1081/BIP-120017726. 53 [214] Yingdong Zhao, Ming-Chung Li, and Richard Simon. An adaptive method for cdna microarray normalization. BMC Bioinformatics, 6:28, 2005. URL: http://dx.doi.org/10.1186/1471-2105-6-28, doi:10.1186/1471-2105-6-28. 53 [215] Gordon K. Smyth and Terry Speed. Normalization of cdna microarray data. Methods, 31(4):265–273, Dec 2003. 53 [216] Robert C. Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Gunther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y H. Yang, and Jianhua Zhang. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 5(10):R80, 2004. URL: http:// dx.doi.org/10.1186/gb-2004-5-10-r80, doi:10.1186/ gb-2004-5-10-r80. 54, 55, 58 [217] O. Alter, P. O. Brown, and D. Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A, 97(18):10101–10106, Aug 2000. 54 [218] Monica Benito, Joel Parker, Quan Du, Junyuan Wu, Dong Xiang, Charles M. Perou, and J. S. Marron. Adjustment of systematic microarray data biases. Bioinformatics, 20(1):105–114, Jan 2004. 55 [219] W Evan Johnson, Cheng Li, and Ariel Rabinovic. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics, 8(1):118–127, Jan 2007. URL: http://dx. doi.org/10.1093/biostatistics/kxj037, doi:10.1093/ biostatistics/kxj037. 55 VII. B IBLIOGRAPHY [220] Jeffrey T. Leek and John D. Storey. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS 383 Genet, 3(9):1724–1735, Sep 2007. URL: http://dx.doi.org/ 10.1371/journal.pgen.0030161, doi:10.1371/journal. pgen.0030161. 55 [221] Wynn L. Walker, Isaac H. Liao, Donald L. Gilbert, Brenda Wong, Katherine S. Pollard, Charles E. McCulloch, Lisa Lit, and Frank R. Sharp. Empirical bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to rna expression profiling of blood from duchenne muscular dystrophy patients. BMC Genomics, 9:494, 2008. URL: http://dx.doi.org/10.1186/ 1471-2164-9-494, doi:10.1186/1471-2164-9-494. 55 [222] Andrew H. Sims, Graeme J. Smethurst, Yvonne Hey, Michal J. Okoniewski, Stuart D. Pepper, Anthony Howell, Crispin J. Miller, and Robert B. Clarke. The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis. BMC Med Genomics, 1:42, 2008. URL: http://dx.doi.org/10.1186/1755-8794-1-42, doi:10.1186/1755-8794-1-42. 55 [223] J. Luo, M. Schumacher, A. Scherer, D. Sanoudou, D. Megherbi, T. Davison, T. Shi, W. Tong, L. Shi, H. Hong, C. Zhao, F. Elloumi, W. Shi, R. Thomas, S. Lin, G. Tillinghast, G. Liu, Y. Zhou, D. Herman, Y. Li, Y. Deng, H. Fang, P. Bushel, M. Woods, and J. Zhang. A comparison of batch effect removal methods for enhancement of prediction performance using maqc-ii microarray gene expression data. Pharmacogenomics J, 10(4):278–291, Aug 2010. URL: http://dx.doi.org/10.1038/ tpj.2010.57, doi:10.1038/tpj.2010.57. 55 [224] Chao Chen, Kay Grennan, Judith Badner, Dandan Zhang, Elliot Gershon, Li Jin, and Chunyu Liu. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One, 6(2):e17238, 2011. URL: http://dx. doi.org/10.1371/journal.pone.0017238, doi:10.1371/ journal.pone.0017238. 55 [225] Jianqing Fan and Yi Ren. Statistical analysis of dna microarray data in cancer research. Clin Cancer Res, 12(15):4469–4473, Aug 2006. URL: http://dx.doi.org/10.1158/1078-0432. CCR-06-1033, doi:10.1158/1078-0432.CCR-06-1033. 55, 56 [226] Gordon K. Smyth. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl 384 Genet Mol Biol, 3:Article3, 2004. URL: http://dx.doi.org/10. 2202/1544-6115.1027, doi:10.2202/1544-6115.1027. 55 [227] Rainer Breitling, Patrick Armengaud, Anna Amtmann, and Pawel Herzyk. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett, 573(1-3):83–92, Aug 2004. URL: http://dx.doi.org/10.1016/ j.febslet.2004.07.055, doi:10.1016/j.febslet.2004. 07.055. 55, 56 [228] Fangxin Hong, Rainer Breitling, Connor W. McEntee, Ben S. Wittner, Jennifer L. Nemhauser, and Joanne Chory. Rankprod: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics, 22(22):2825–2827, Nov 2006. URL: http:// dx.doi.org/10.1093/bioinformatics/btl476, doi:10. 1093/bioinformatics/btl476. 56 [229] Hyuna Yang and Gary Churchill. Estimating p-values in small microarray experiments. Bioinformatics, 23(1):38–43, Jan 2007. URL: http:// dx.doi.org/10.1093/bioinformatics/btl548, doi:10. 1093/bioinformatics/btl548. 56 [230] William S. Noble. How does multiple testing correction work? Nat Biotechnol, 27(12):1135–1137, Dec 19932009. URL: http://dx.doi.org/10.1038/nbt1209-1135, doi:10.1038/nbt1209-1135. 56 [231] B. Ren, F. Robert, J. J. Wyrick, O. Aparicio, E. G. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T. L. Volkert, C. J. Wilson, S. P. Bell, and R. A. Young. Genome-wide location and function of dna binding proteins. Science, 290(5500):2306–2309, Dec 2000. URL: http://dx.doi.org/10.1126/science.290. 5500.2306, doi:10.1126/science.290.5500.2306. 56 [232] J. M. Ordway, J. A. Bedell, R. W. Citek, A. Nunberg, A. Garrido, R. Kendall, J. R. Stevens, D. Cao, R. W. Doerge, Y. Korshunova, H. Holemon, J. D. McPherson, N. Lakey, J. Leon, R. A. Martienssen, and J. A. Jeddeloh. Comprehensive dna methylation profiling in a human cancer genome identifies novel epigenetic targets. Carcinogenesis, 27(12):2409–2423, Dec 2006. URL: http://dx.doi.org/10. 1093/carcin/bgl161, doi:10.1093/carcin/bgl161. 56 VII. B IBLIOGRAPHY [233] Shouyong Peng, Artyom A. Alekseyenko, Erica Larschan, Mitzi I. Kuroda, and Peter J. Park. Normalization and experimental 385 design for chip-chip data. BMC Bioinformatics, 8:219, 2007. URL: http://dx.doi.org/10.1186/1471-2105-8-219, doi:10.1186/1471-2105-8-219. 57 [234] Michiel E. Adriaens, Magali Jaillard, Lars Mt Eijssen, Claus-Dieter Mayer, and Chris Ta Evelo. An evaluation of two-channel chip-on-chip and dna methylation microarray normalization strategies. BMC Genomics, 13:42, 2012. URL: http://dx.doi.org/10.1186/ 1471-2164-13-42, doi:10.1186/1471-2164-13-42. 57, 58 [235] Francis D. Gibbons, Markus Proft, Kevin Struhl, and Frederick P. Roth. Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol, 6(11):R96, 2005. URL: http://dx.doi.org/10.1186/ gb-2005-6-11-r96, doi:10.1186/gb-2005-6-11-r96. 59 [236] Hongkai Ji, Hui Jiang, Wenxiu Ma, David S. Johnson, Richard M. Myers, and Wing H. Wong. An integrated software system for analyzing chip-chip and chip-seq data. Nat Biotechnol, 26(11):1293–1300, Nov 2008. URL: http://dx.doi.org/10.1038/nbt.1505, doi: 10.1038/nbt.1505. 59 [237] Touati Benoukraf, Pierre Cauchy, Romain Fenouil, Adrien Jeanniard, Frederic Koch, Sebastien Jaeger, Denis Thieffry, Jean Imbert, Jean-Christophe Andrau, Salvatore Spicuglia, and Pierre Ferrier. Cocas: a chip-on-chip analysis suite. Bioinformatics, 25(7):954–955, Apr 2009. URL: http://dx.doi.org/10.1093/bioinformatics/ btp075, doi:10.1093/bioinformatics/btp075. 59 [238] Peter J. Park. Chip-seq: advantages and challenges of a maturing technology. Nat Rev Genet, 10(10):669–680, Oct 2009. URL: http:// dx.doi.org/10.1038/nrg2641, doi:10.1038/nrg2641. 60, 61, 62, 63, 64, 71, 74 [239] Peter J. Park. Epigenetics meets next-generation sequencing. Epigenetics, 3(6):318–321, Nov 2008. 60 [240] Marion Leleu, Gregory Lefebvre, and Jacques Rougemont. Processing and analyzing chip-seq data: from short reads to regulatory interactions. Brief Funct Genomics, 9(5-6):466–476, Dec 2010. URL: http:// dx.doi.org/10.1093/bfgp/elq022, doi:10.1093/bfgp/ elq022. 62, 67, 68, 72 [241] Margaret Taub, Doron Lipson, and Terence. Speed. Methods for allocating ambiguous short-reads. Communications in information and 386 systems., 10(2):69–82, 2010. URL: http://projecteuclid.org/ euclid.cis/1268143264. 64 [242] Cole Trapnell and Steven L. Salzberg. How to map billions of short reads onto genomes. Nat Biotechnol, 27(5):455–457, May 2009. URL: http://dx.doi.org/10.1038/nbt0509-455, doi:10.1038/nbt0509-455. 64, 66 [243] David L A. Wood, Qinying Xu, John V. Pearson, Nicole Cloonan, and Sean M. Grimmond. X-mate: a flexible system for mapping short read data. Bioinformatics, 27(4):580–581, Feb 2011. URL: http:// dx.doi.org/10.1093/bioinformatics/btq698, doi:10. 1093/bioinformatics/btq698. 64 [244] Nicola Palmieri and Christian Schlotterer. Mapping accuracy of short reads from massively parallel sequencing and the implications for quantitative expression profiling. PLoS One, 4(7):e6323, 2009. URL: http://dx.doi.org/10.1371/journal.pone. 0006323, doi:10.1371/journal.pone.0006323. 64 [245] Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol, 10(3):R25, 2009. URL: http:// dx.doi.org/10.1186/gb-2009-10-3-r25, doi:10.1186/ gb-2009-10-3-r25. 65, 78 [246] Teemu D. Laajala, Sunil Raghav, Soile Tuomela, Riitta Lahesmaa, Tero Aittokallio, and Laura L. Elo. A practical comparison of methods for detecting transcription factor binding sites in chip-seq experiments. BMC Genomics, 10:618, 2009. URL: http://dx.doi.org/10.1186/ 1471-2164-10-618, doi:10.1186/1471-2164-10-618. 68 [247] Shirley Pepke, Barbara Wold, and Ali Mortazavi. Computation for chip-seq and rna-seq studies. Nat Methods, 6(11 Suppl):S22–S32, Nov 2009. URL: http://dx.doi.org/10.1038/nmeth.1371, doi:10.1038/nmeth.1371. 68, 69, 78, 80 [248] Elizabeth G. Wilbanks and Marc T. Facciotti. Evaluation of algorithm performance in chip-seq peak detection. PLoS One, 5(7):e11471, 2010. URL: http://dx.doi.org/10.1371/journal.pone. 0011471, doi:10.1371/journal.pone.0011471. 68 VII. B IBLIOGRAPHY [249] Adam M. Szalkowski and Christoph D. Schmid. Rapid innovation in chip-seq peak-calling algorithms is outdistancing benchmarking efforts. 387 Brief Bioinform, 12(6):626–633, Nov 2011. URL: http://dx.doi. org/10.1093/bib/bbq068, doi:10.1093/bib/bbq068. 68 [250] Laurence D. Parnell, Pierre Lindenbaum, Khader Shameer, Giovanni Marco Dall’Olio, Daniel C. Swan, Lars Juhl Jensen, Simon J. Cockell, Brent S. Pedersen, Mary E. Mangan, Christopher A. Miller, and Istvan Albert. Biostar: an online question & answer resource for the bioinformatics community. PLoS Comput Biol, 7(10):e1002216, Oct 2011. URL: http://dx.doi.org/10.1371/journal.pcbi. 1002216, doi:10.1371/journal.pcbi.1002216. 68 [251] Anais F. Bardet, Qiye He, Julia Zeitlinger, and Alexander Stark. A computational pipeline for comparative chip-seq analyses. Nat Protoc, 7(1):45–61, Jan 2012. URL: http://dx.doi.org/10.1038/ nprot.2011.420, doi:10.1038/nprot.2011.420. 68 ˜ ˜ [252] Yong Zhang, Tao Liu, Clifford A. Meyer, JA c rA´me Eeckhoute, David S. Johnson, Bradley E. Bernstein, Chad Nusbaum, Richard M. Myers, Myles Brown, Wei Li, and X Shirley Liu. Model-based analysis of chip-seq (macs). Genome Biol, 9(9):R137, 2008. URL: http:// dx.doi.org/10.1186/gb-2008-9-9-r137, doi:10.1186/ gb-2008-9-9-r137. 70 [253] Chongzhi Zang, Dustin E. Schones, Chen Zeng, Kairong Cui, Keji Zhao, and Weiqun Peng. A clustering approach for identification of enriched domains from histone modification chip-seq data. Bioinformatics, 25(15):1952–1958, Aug 2009. URL: http://dx.doi.org/10.1093/bioinformatics/btp340, doi:10.1093/bioinformatics/btp340. 70 ˜ ˜ [254] Sonja Althammer, Juan GonzA¡lez-Vallinas, Cecilia BallarA c , Miguel Beato, and Eduardo Eyras. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data. Bioinformatics, 27(24):3333–3340, Dec 2011. URL: http: //dx.doi.org/10.1093/bioinformatics/btr570, doi:10.1093/bioinformatics/btr570. 72 [255] Cenny Taslim, Jiejun Wu, Pearlly Yan, Greg Singer, Jeffrey Parvin, Tim Huang, Shili Lin, and Kun Huang. Comparative study on chip-seq data: normalization and binding pattern characterization. Bioinformatics, 25(18):2334–2340, Sep 2009. URL: http:// dx.doi.org/10.1093/bioinformatics/btp384, doi:10. 1093/bioinformatics/btp384. 73 388 ˜ ˜ [256] Lihua J. Zhu, Claude Gazin, Nathan D. Lawson, HervA c PagA¨s, Simon M. Lin, David S. Lapointe, and Michael R. Green. Chippeakanno: a bioconductor package to annotate chip-seq and chip-chip data. BMC Bioinformatics, 11:237, 2010. URL: http://dx.doi.org/10.1186/1471-2105-11-237, doi:10.1186/1471-2105-11-237. 75, 88 [257] Hyunjin Shin, Tao Liu, Arjun K. Manrai, and X Shirley Liu. Ceas: cis-regulatory element annotation system. Bioinformatics, 25(19):2605–2606, Oct 2009. URL: http: //dx.doi.org/10.1093/bioinformatics/btp479, doi:10.1093/bioinformatics/btp479. 75 [258] Aaron R. Quinlan and Ira M. Hall. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6):841–842, Mar 2010. URL: http://dx.doi.org/10.1093/ bioinformatics/btq033, doi:10.1093/bioinformatics/ btq033. 75 [259] Arnaud Krebs, Mattia Frontini, and Laszlo Tora. Gpat: retrieval of genomic annotation from large genomic position datasets. BMC Bioinformatics, 9:533, 2008. URL: http://dx.doi.org/10. 1186/1471-2105-9-533, doi:10.1186/1471-2105-9-533. 75 [260] Sven Heinz, Christopher Benner, Nathanael Spann, Eric Bertolino, Yin C. Lin, Peter Laslo, Jason X. Cheng, Cornelis Murre, Harinder Singh, and Christopher K. Glass. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. Mol Cell, 38(4):576–589, May 2010. URL: http://dx.doi.org/10.1016/j.molcel.2010. 05.004, doi:10.1016/j.molcel.2010.05.004. 75 [261] Eugenia G. Giannopoulou and Olivier Elemento. An integrated chip-seq analysis platform with customizable workflows. BMC Bioinformatics, 12:277, 2011. URL: http://dx.doi.org/10.1186/ 1471-2105-12-277, doi:10.1186/1471-2105-12-277. 75, 88 [262] Dongwon Lee, Rachel Karchin, and Michael A. Beer. Discriminative prediction of mammalian enhancers from dna sequence. Genome Res, 21(12):2167–2180, Dec 2011. URL: http://dx.doi.org/10. 1101/gr.121905.111, doi:10.1101/gr.121905.111. 75 389 VII. B IBLIOGRAPHY [263] Adam Siepel, Gill Bejerano, Jakob S. Pedersen, Angie S. Hinrichs, Minmei Hou, Kate Rosenbloom, Hiram Clawson, John Spieth, Ladeana W. Hillier, Stephen Richards, George M. Weinstock, Richard K. Wilson, Richard A. Gibbs, W James Kent, Webb Miller, and David Haussler. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 15(8):1034–1050, Aug 2005. URL: http://dx.doi.org/10.1101/gr.3715005, doi:10.1101/ gr.3715005. 75 [264] Tao Ye, Arnaud R. Krebs, Mohamed-Amin Choukrallah, Celine Keime, Frederic Plewniak, Irwin Davidson, and Laszlo Tora. seqminer: an integrated chip-seq data interpretation platform. Nucleic Acids Res, 39(6):e35, Mar 2011. URL: http://dx.doi.org/10.1093/nar/ gkq1287, doi:10.1093/nar/gkq1287. 76 [265] Zhong Wang, Mark Gerstein, and Michael Snyder. Rna-seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 10(1):57–63, Jan 2009. URL: http://dx.doi.org/10.1038/nrg2484, doi: 10.1038/nrg2484. 76, 297 [266] Alicia Oshlack, Mark D. Robinson, and Matthew D. Young. From rna-seq reads to differential expression results. Genome Biol, 11(12):220, 2010. URL: http://dx.doi.org/10.1186/gb-2010-11-12-220, doi:10.1186/gb-2010-11-12-220. 76, 77, 78, 79, 84 [267] Cole Trapnell, Lior Pachter, and Steven L. Salzberg. Tophat: discovering splice junctions with rna-seq. Bioinformatics, 25(9):1105–1111, May 2009. URL: http://dx.doi.org/ 10.1093/bioinformatics/btp120, doi:10.1093/ bioinformatics/btp120. 78, 82 [268] Heng Li, Jue Ruan, and Richard Durbin. Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Res, 18(11):1851–1858, Nov 2008. URL: http://dx.doi.org/10. 1101/gr.078212.108, doi:10.1101/gr.078212.108. 81, 82 [269] Cole Trapnell, Brian A. Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J. van Baren, Steven L. Salzberg, Barbara J. Wold, and Lior Pachter. Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 28(5):511–515, May 2010. URL: http://dx.doi. org/10.1038/nbt.1621, doi:10.1038/nbt.1621. 81 [270] Ali Mortazavi, Brian A. Williams, Kenneth McCue, Lorian Schaeffer, and Barbara Wold. Mapping and quantifying mammalian transcriptomes 390 by rna-seq. Nat Methods, 5(7):621–628, Jul 2008. URL: http:// dx.doi.org/10.1038/nmeth.1226, doi:10.1038/nmeth. 1226. 81, 83 [271] John C. Marioni, Christopher E. Mason, Shrikant M. Mane, Matthew Stephens, and Yoav Gilad. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res, 18(9):1509–1517, Sep 2008. URL: http://dx.doi.org/10. 1101/gr.079558.108, doi:10.1101/gr.079558.108. 81, 84 [272] James H. Bullard, Elizabeth Purdom, Kasper D. Hansen, and Sandrine Dudoit. Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC Bioinformatics, 11:94, 2010. URL: http://dx.doi.org/10. 1186/1471-2105-11-94, doi:10.1186/1471-2105-11-94. 81, 83 [273] Edward Sendler, Graham D. Johnson, and Stephen A. Krawetz. Local and global factors affecting rna sequencing analysis. Anal Biochem, 419(2):317–322, Dec 2011. URL: http://dx.doi.org/10.1016/ j.ab.2011.08.013, doi:10.1016/j.ab.2011.08.013. 81 [274] Kasper D. Hansen, Rafael A. Irizarry, and Zhijin Wu. Removing technical variability in rna-seq data using conditional quantile normalization. Biostatistics, 13(2):204–216, Apr 2012. URL: http://dx. doi.org/10.1093/biostatistics/kxr054, doi:10.1093/ biostatistics/kxr054. 83, 84 [275] Davide Risso, Katja Schwartz, Gavin Sherlock, and Sandrine Dudoit. Gc-content normalization for rna-seq data. BMC Bioinformatics, 12(1):480, Dec 2011. URL: http://dx.doi.org/10.1186/ 1471-2105-12-480, doi:10.1186/1471-2105-12-480. 83 [276] Seungtai Yoon, Zhenyu Xuan, Vladimir Makarov, Kenny Ye, and Jonathan Sebat. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res, 19(9):1586–1592, Sep 2009. URL: http://dx.doi.org/10.1101/gr.092981.109, doi:10.1101/gr.092981.109. 83 [277] Valentina Boeva, Andrei Zinovyev, Kevin Bleakley, Jean-Philippe Vert, Isabelle Janoueix-Lerosey, Olivier Delattre, and Emmanuel Barillot. Control-free calling of copy number alterations in deep-sequencing data using gc-content normalization. Bioinformatics, 27(2):268–269, Jan 2011. URL: http://dx.doi.org/10.1093/ 391 VII. B IBLIOGRAPHY bioinformatics/btq635, doi:10.1093/bioinformatics/ btq635. 83 [278] Kasper D. Hansen, Steven E. Brenner, and Sandrine Dudoit. Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res, 38(12):e131, Jul 2010. URL: http://dx.doi. org/10.1093/nar/gkq224, doi:10.1093/nar/gkq224. 83 [279] Adam Roberts, Cole Trapnell, Julie Donaghey, John L. Rinn, and Lior Pachter. Improving rna-seq expression estimates by correcting for fragment bias. Genome Biol, 12(3):R22, 2011. URL: http:// dx.doi.org/10.1186/gb-2011-12-3-r22, doi:10.1186/ gb-2011-12-3-r22. 83 [280] Daniel C. Jones, Walter L. Ruzzo, Xinxia Peng, and Michael G. Katze. A new approach to bias correction in rna-seq. Bioinformatics, Jan 2012. URL: http://dx.doi.org/10.1093/bioinformatics/ bts055, doi:10.1093/bioinformatics/bts055. 83 [281] Wei Zheng, Lisa M. Chung, and Hongyu Zhao. Bias detection and correction in rna-sequencing data. BMC Bioinformatics, 12:290, 2011. URL: http://dx.doi.org/10.1186/1471-2105-12-290, doi:10.1186/1471-2105-12-290. 83 [282] Mark D. Robinson and Alicia Oshlack. A scaling normalization method for differential expression analysis of rna-seq data. Genome Biol, 11(3):R25, 2010. URL: http://dx.doi.org/10.1186/ gb-2010-11-3-r25, doi:10.1186/gb-2010-11-3-r25. 84 [283] Piotr J. Balwierz, Piero Carninci, Carsten O. Daub, Jun Kawai, Yoshihide Hayashizaki, Werner Van Belle, Christian Beisel, and Erik van Nimwegen. Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepcage data. Genome Biol, 10(7):R79, 2009. URL: http:// dx.doi.org/10.1186/gb-2009-10-7-r79, doi:10.1186/ gb-2009-10-7-r79. 84 [284] Likun Wang, Zhixing Feng, Xi Wang, Xiaowo Wang, and Xuegong Zhang. Degseq: an r package for identifying differentially expressed genes from rna-seq data. Bioinformatics, 26(1):136–138, Jan 2010. URL: http://dx.doi.org/10.1093/bioinformatics/btp612, doi:10.1093/bioinformatics/btp612. 84 [285] Simon Anders and Wolfgang Huber. Differential expression analysis for sequence count data. Genome Biol, 11(10):R106, 2010. 392 URL: http://dx.doi.org/10.1186/gb-2010-11-10-r106, doi:10.1186/gb-2010-11-10-r106. 84 [286] Mark D. Robinson, Davis J. McCarthy, and Gordon K. Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140, Jan 2010. URL: http://dx.doi.org/10.1093/bioinformatics/btp616, doi:10.1093/bioinformatics/btp616. 84 [287] Thomas J. Hardcastle and Krystyna A. Kelly. bayseq: empirical bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11:422, 2010. URL: http:// dx.doi.org/10.1186/1471-2105-11-422, doi:10.1186/ 1471-2105-11-422. 84 [288] Ben Langmead, Kasper D. Hansen, and Jeffrey T. Leek. Cloud-scale rna-sequencing differential expression analysis with myrna. Genome Biol, 11(8):R83, 2010. URL: http://dx.doi.org/10.1186/ gb-2010-11-8-r83, doi:10.1186/gb-2010-11-8-r83. 84 [289] Mark D. Robinson and Gordon K. Smyth. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21):2881–2887, Nov 2007. URL: http: //dx.doi.org/10.1093/bioinformatics/btm453, doi:10.1093/bioinformatics/btm453. 84 [290] Giulio Pavesi, Paolo Mereghetti, Giancarlo Mauri, and Graziano Pesole. Weeder web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res, 32(Web Server issue):W199–W203, Jul 2004. URL: http://dx.doi.org/10. 1093/nar/gkh465, doi:10.1093/nar/gkh465. 85, 86 [291] Giulio Pavesi and Graziano Pesole. Using weeder for the discovery of conserved transcription factor binding sites. Curr Protoc Bioinformatics, Chapter 2:Unit 2.11, Oct 2006. URL: http://dx.doi.org/10. 1002/0471250953.bi0211s15, doi:10.1002/0471250953. bi0211s15. 85 [292] Elodie Portales-Casamar, Supat Thongjuea, Andrew T. Kwon, David Arenillas, Xiaobei Zhao, Eivind Valen, Dimas Yusuf, Boris Lenhard, Wyeth W. Wasserman, and Albin Sandelin. Jaspar 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res, 38(Database issue):D105–D110, Jan 2010. URL: http://dx.doi.org/10.1093/nar/gkp950, doi:10.1093/ nar/gkp950. 85 393 VII. B IBLIOGRAPHY [293] Modan K. Das and Ho-Kwok Dai. A survey of dna motif finding algorithms. BMC Bioinformatics, 8 Suppl 7:S21, 2007. URL: http://dx.doi.org/10.1186/1471-2105-8-S7-S21, doi:10.1186/1471-2105-8-S7-S21. 86 [294] D. A. Tagle, B. F. Koop, M. Goodman, J. L. Slightom, D. L. Hess, and R. T. Jones. Embryonic epsilon and gamma globin genes of a prosimian primate (galago crassicaudatus). nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol, 203(2):439–455, Sep 1988. 86 [295] Mathieu Blanchette and Martin Tompa. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res, 12(5):739–748, May 2002. URL: http://dx.doi.org/10.1101/ gr.6902, doi:10.1101/gr.6902. 86 [296] Martin Tompa, Nan Li, Timothy L. Bailey, George M. Church, Bart De Moor, Eleazar Eskin, Alexander V. Favorov, Martin C. Frith, Yutao Fu, W James Kent, Vsevolod J. Makeev, Andrei A. Mironov, William Stafford Noble, Giulio Pavesi, Graziano Pesole, Mireille ˜ RA c gnier, Nicolas Simonis, Saurabh Sinha, Gert Thijs, Jacques van Helden, Mathias Vandenbogaert, Zhiping Weng, Christopher Workman, Chun Ye, and Zhou Zhu. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol, 23(1):137–144, Jan 2005. URL: http://dx.doi.org/10.1038/nbt1053, doi: 10.1038/nbt1053. 86 [297] P. A. Pevzner and S. H. Sze. Combinatorial approaches to finding subtle signals in dna sequences. Proc Int Conf Intell Syst Mol Biol, 8:269–278, 2000. 86 [298] Timothy L. Bailey, Nadya Williams, Chris Misleh, and Wilfred W. Li. Meme: discovering and analyzing dna and protein sequence motifs. Nucleic Acids Res, 34(Web Server issue):W369–W373, Jul 2006. URL: http://dx.doi.org/10.1093/nar/gkl198, doi:10.1093/ nar/gkl198. 86 [299] L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol, 7(3-4):345–362, 2000. URL: http://dx.doi.org/10.1089/106652700750050826, doi: 10.1089/106652700750050826. 86 394 [300] E. Wingender, P. Dietze, H. Karas, and R. Knuppel. Transfac: a database on transcription factors and their dna binding sites. Nucleic Acids Res, 24(1):238–241, Jan 1996. 87 [301] Albin Sandelin, Wynand Alkema, Par Engstrom, Wyeth W. Wasserman, and Boris Lenhard. Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res, 32(Database issue):D91–D94, Jan 2004. URL: http://dx.doi.org/10.1093/ nar/gkh012, doi:10.1093/nar/gkh012. 87 [302] Dustin E. Schones, Andrew D. Smith, and Michael Q. Zhang. Statistical significance of cis-regulatory modules. BMC Bioinformatics, 8:19, 2007. URL: http://dx.doi.org/10.1186/1471-2105-8-19, doi:10.1186/1471-2105-8-19. 87 [303] Da Wei Huang, Brad T. Sherman, and Richard A. Lempicki. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res, 37(1):1–13, Jan 2009. URL: http://dx.doi.org/10.1093/nar/gkn923, doi:10.1093/nar/gkn923. 88, 89 [304] Purvesh Khatri and Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587–3595, Sep 2005. URL: http:// dx.doi.org/10.1093/bioinformatics/bti565, doi:10. 1093/bioinformatics/bti565. 88 [305] Cory Y. McLean, Dave Bristor, Michael Hiller, Shoa L. Clarke, Bruce T. Schaar, Craig B. Lowe, Aaron M. Wenger, and Gill Bejerano. Great improves functional interpretation of cis-regulatory regions. Nat Biotechnol, 28(5):495–501, May 2010. URL: http://dx.doi.org/ 10.1038/nbt.1630, doi:10.1038/nbt.1630. 89 [306] Matthew D. Young, Matthew J. Wakefield, Gordon K. Smyth, and Alicia Oshlack. Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biol, 11(2):R14, 2010. URL: http:// dx.doi.org/10.1186/gb-2010-11-2-r14, doi:10.1186/ gb-2010-11-2-r14. 89 [307] Christian Perez-Llamas and Nuria Lopez-Bigas. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One, 6(5):e19541, 2011. URL: http://dx.doi.org/ 10.1371/journal.pone.0019541, doi:10.1371/journal. pone.0019541. 89 395 VII. B IBLIOGRAPHY [308] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet, 25(1):25–29, May 2000. URL: http://dx.doi.org/10.1038/ 75556, doi:10.1038/75556. 90 [309] Syed Haider, Benoit Ballester, Damian Smedley, Junjun Zhang, Peter Rice, and Arek Kasprzyk. Biomart central portal–unified access to biological data. Nucleic Acids Res, 37(Web Server issue):W23–W27, Jul 2009. URL: http://dx.doi.org/10.1093/nar/gkp265, doi:10.1093/nar/gkp265. 90 [310] M. Kanehisa and S. Goto. Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28(1):27–30, Jan 2000. 90 [311] Gunes Gundem, Christian Perez-Llamas, Alba Jene-Sanz, Anna Kedzierska, Abul Islam, Jordi Deu-Pons, Simon J. Furney, and Nuria Lopez-Bigas. Intogen: integration and data mining of multidimensional oncogenomic data. Nat Methods, 7(2):92–93, Feb 2010. URL: http://dx.doi.org/10.1038/nmeth0210-92, doi:10.1038/nmeth0210-92. 90 [312] Andrew P. Feinberg. Phenotypic plasticity and the epigenetics of human disease. Nature, 447(7143):433–440, May 2007. URL: http://dx.doi.org/10.1038/nature05919, doi:10.1038/nature05919. 289 [313] Joseph R. Nevins and Anil Potti. Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet, 8(8):601–609, Aug 2007. URL: http://dx.doi.org/10.1038/ nrg2137, doi:10.1038/nrg2137. 289 [314] R. A. Weinberg. The retinoblastoma protein and cell cycle control. Cell, 81(3):323–330, May 1995. 289 [315] C. J. Sherr. Cancer cell cycles. Science, 274(5293):1672–1677, Dec 1996. 289 [316] E. J. Morris and N. J. Dyson. Retinoblastoma protein partners. Adv Cancer Res, 82:1–54, 2001. 289 [317] Jeffrey T. Chang and Joseph R. Nevins. Gather: a systems approach to interpreting genomic signatures. Bioinformatics, 396 22(23):2926–2933, Dec 2006. URL: http://dx.doi.org/ 10.1093/bioinformatics/btl483, doi:10.1093/ bioinformatics/btl483. 289 [318] E. K. Flemington, S. H. Speck, and WG Kaelin, Jr. E2f-1-mediated transactivation is inhibited by complex formation with the retinoblastoma susceptibility gene product. Proc Natl Acad Sci U S A, 90(15):6914–6918, Aug 1993. 289 [319] J. F. Ross, X. Liu, and B. D. Dynlacht. Mechanism of transcriptional repression of e2f by the retinoblastoma tumor suppressor protein. Mol Cell, 3(2):195–205, Feb 1999. 289 [320] Alexandre Blais and Brian D. Dynlacht. E2f-associated chromatin modifiers and cell cycle control. Curr Opin Cell Biol, 19(6):658–662, Dec 2007. URL: http://dx.doi.org/10.1016/j.ceb.2007. 10.003, doi:10.1016/j.ceb.2007.10.003. 289, 290 [321] Egle Balciunaite, Alexander Spektor, Nathan H. Lents, Hugh Cam, Hein Te Riele, Anthony Scime, Michael A. Rudnicki, Richard Young, and Brian David Dynlacht. Pocket protein complexes are recruited to distinct targets in quiescent and proliferating cells. Mol Cell Biol, 25(18):8166–8178, Sep 2005. URL: http://dx.doi.org/10. 1128/MCB.25.18.8166-8178.2005, doi:10.1128/MCB.25. 18.8166-8178.2005. 289 [322] Hugh Cam, Egle Balciunaite, Alexandre Blais, Alexander Spektor, Richard C. Scarpulla, Richard Young, Yuval Kluger, and Brian David Dynlacht. A common set of gene regulatory networks links metabolism and growth inhibition. Mol Cell, 16(3):399–411, Nov 2004. URL: http: //dx.doi.org/10.1016/j.molcel.2004.09.037, doi:10. 1016/j.molcel.2004.09.037. 289 [323] Shugaku Takeda, David Y. Chen, Todd D. Westergard, Jill K. Fisher, Jeffrey A. Rubens, Satoru Sasagawa, Jason T. Kan, Stanley J. Korsmeyer, Emily H-Y. Cheng, and James J-D. Hsieh. Proteolysis of mll family proteins is essential for taspase1-orchestrated cell cycle progression. Genes Dev, 20(17):2397–2409, Sep 2006. URL: http: //dx.doi.org/10.1101/gad.1449406, doi:10.1101/gad. 1449406. 290 [324] Shweta Tyagi, Anna Lena Chabes, Joanna Wysocka, and Winship Herr. E2f activation of s phase promoters via association with hcf-1 and the mll family of histone h3k4 methyltransferases. Mol Cell, 27(1):107–119, Jul 397 VII. B IBLIOGRAPHY 2007. URL: http://dx.doi.org/10.1016/j.molcel.2007. 05.030, doi:10.1016/j.molcel.2007.05.030. 290 [325] D. W. Goodrich. The retinoblastoma tumor-suppressor gene, the exception that proves the rule. Oncogene, 25(38):5233–5243, Aug 2006. URL: http://dx.doi.org/10.1038/sj.onc.1209616, doi: 10.1038/sj.onc.1209616. 290 [326] Julien Sage, Abigail L. Miller, Pedro A. Perez-Mancera, Julianne M. Wysocki, and Tyler Jacks. Acute mutation of retinoblastoma gene function is sufficient for cell cycle re-entry. Nature, 424(6945):223–228, Jul 2003. URL: http://dx.doi.org/10.1038/nature01764, doi:10.1038/nature01764. 291 [327] G. L. Condorelli, U. Testa, M. Valtieri, L. Vitelli, A. De Luca, T. Barberi, E. Montesoro, S. Campisi, A. Giordano, and C. Peschle. Modulation of retinoblastoma gene in normal adult hematopoiesis: peak expression and functional role in advanced erythroid differentiation. Proc Natl Acad Sci U S A, 92(11):4808–4812, May 1995. 291 [328] B. G. Novitch, G. J. Mulligan, T. Jacks, and A. B. Lassar. Skeletal muscle cells lacking the retinoblastoma protein display defects in muscle gene expression and accumulate in s and g2 phases of the cell cycle. J Cell Biol, 135(2):441–456, Oct 1996. 291 [329] E. Zacksenhaus, Z. Jiang, D. Chung, J. D. Marth, R. A. Phillips, and B. L. Gallie. prb controls proliferation, differentiation, and death of skeletal muscle cells and other lineages during embryogenesis. Genes Dev, 10(23):3051–3064, Dec 1996. 291 [330] G. Bergh, M. Ehinger, T. Olofsson, B. Baldetorp, E. Johnsson, H. Brycke, G. Lindgren, I. Olsson, and U. Gullberg. Altered expression of the retinoblastoma tumor-suppressor gene in leukemic cell lines inhibits induction of differentiation but not g1-accumulation. Blood, 89(8):2938–2950, Apr 1997. 291 [331] Huifang Sun, Yanjie Chang, Brett Schweers, Michael A. Dyer, Xiaojing Zhang, Simon W. Hayward, and David W. Goodrich. An e2f binding-deficient rb1 protein partially rescues developmental defects associated with rb1 nullizygosity. Mol Cell Biol, 26(4):1527–1537, Feb 2006. URL: http://dx.doi.org/10.1128/MCB.26. 4.1527-1537.2006, doi:10.1128/MCB.26.4.1527-1537. 2006. 291 398 [332] Abul B M M K. Islam, William F. Richter, Nuria Lopez-Bigas, and Elizaveta V. Benevolenskaya. Selective targeting of histone methylation. Cell Cycle, 10(3):413–424, Feb 2011. 292, 293, 294 [333] Abul B M M K. Islam, William F. Richter, Laura A. Jacobs, Nuria Lopez-Bigas, and Elizaveta V. Benevolenskaya. Co-regulation of histone-modifying enzymes in cancer. PLoS One, 6(8):e24023, 2011. URL: http://dx.doi.org/10.1371/journal.pone. 0024023, doi:10.1371/journal.pone.0024023. 292, 293 [334] Michael L. Beshiri, Abul Islam, Dannielle C. DeWaal, William F. Richter, Jennifer Love, Nuria Lopez-Bigas, and Elizaveta V. Benevolenskaya. Genome-wide analysis using chip to identify isoform-specific gene targets. J Vis Exp, (41), 2010. URL: http://dx.doi.org/10. 3791/2101, doi:10.3791/2101. 293 [335] Thomas A. Milne, Jaehoon Kim, Gang G. Wang, Sonja C. Stadler, Venkatesha Basrur, Sarah J. Whitcomb, Zhanxin Wang, Alexander J. Ruthenburg, Kojo S J. Elenitoba-Johnson, Robert G. Roeder, and C David Allis. Multiple interactions recruit mll1 and mll1 fusion proteins to the hoxa9 locus in leukemogenesis. Mol Cell, 38(6):853–863, Jun 2010. URL: http://dx.doi.org/10.1016/j.molcel.2010. 05.011, doi:10.1016/j.molcel.2010.05.011. 294 [336] Danian Chen, Marek Pacal, Pamela Wenzel, Paul S. Knoepfler, Gustavo Leone, and Rod Bremner. Division and apoptosis of e2f-deficient retinal progenitors. Nature, 462(7275):925–929, Dec 2009. URL: http://dx.doi.org/10.1038/nature08544, doi:10. 1038/nature08544. 294 ˜ [337] Jean-Leon Chong, Pamela L. Wenzel, M Teresa SA¡enz-Robles, Vivek Nair, Antoney Ferrey, John P. Hagan, Yorman M. Gomez, Nidhi Sharma, Hui-Zi Chen, Madhu Ouseph, Shu-Huei Wang, Prashant Trikha, Brian Culp, Louise Mezache, Douglas J. Winton, Owen J. Sansom, Danian Chen, Rod Bremner, Paul G. Cantalupo, Michael L. Robinson, James M. Pipas, and Gustavo Leone. E2f1-3 switch from activators in progenitor cells to repressors in differentiating cells. Nature, 462(7275):930–934, Dec 2009. URL: http://dx.doi.org/10.1038/nature08677, doi:10.1038/nature08677. 294 [338] R. J. Duronio, P. H. O’Farrell, J. E. Xie, A. Brook, and N. Dyson. The transcription factor e2f is required for s phase during drosophila embryogenesis. Genes Dev, 9(12):1445–1455, Jun 1995. 295 399 VII. B IBLIOGRAPHY [339] W. Du. Suppression of the rbf null mutants by a de2f1 allele that lacks transactivation domain. Development, 127(2):367–379, Jan 2000. 295 [340] M. V. Frolov, D. S. Huen, O. Stevaux, D. Dimova, K. Balczarek-Strang, M. Elsdon, and N. J. Dyson. Functional antagonism between e2f family members. Genes Dev, 15(16):2146–2160, Aug 2001. URL: http://dx.doi.org/10.1101/gad.903901, doi:10.1101/ gad.903901. 295 [341] Kieran F. Harvey, Cathie M. Pfleger, and Iswar K. Hariharan. The drosophila mst ortholog, hippo, restricts growth and cell proliferation and promotes apoptosis. Cell, 114(4):457–467, Aug 2003. 295 [342] Jianbin Huang, Shian Wu, Jose Barrera, Krista Matthews, and Duojia Pan. The hippo signaling pathway coordinately regulates cell proliferation and apoptosis by inactivating yorkie, the drosophila homolog of yap. Cell, 122(3):421–434, Aug 2005. URL: http://dx. doi.org/10.1016/j.cell.2005.06.007, doi:10.1016/j. cell.2005.06.007. 295 [343] Brandon N. Nicolay and Maxim V. Frolov. Context-dependent requirement for de2f during oncogenic proliferation. PLoS Genet, 4(10):e1000205, 2008. URL: http://dx.doi.org/10.1371/ journal.pgen.1000205, doi:10.1371/journal.pgen. 1000205. 295 [344] J. R. Nevins. The rb/e2f pathway and cancer. 10(7):699–703, Apr 2001. 295 Hum Mol Genet, [345] Brandon N. Nicolay, Battuya Bayarmagnai, Abul B M M K. Islam, Nuria Lopez-Bigas, and Maxim V. Frolov. Cooperation between de2f1 and yki/sd defines a distinct transcriptional program necessary to bypass cell cycle exit. Genes Dev, 25(4):323–335, Feb 2011. URL: http: //dx.doi.org/10.1101/gad.1999211, doi:10.1101/gad. 1999211. 295 [346] Brandon N. Nicolay, Battuya Bayarmagnai, Nam Sung Moon, Elizaveta V. Benevolenskaya, and Maxim V. Frolov. Combined inactivation of prb and hippo pathways induces dedifferentiation in the drosophila retina. PLoS Genet, 6(4):e1000918, Apr 2010. URL: http: //dx.doi.org/10.1371/journal.pgen.1000918, doi:10. 1371/journal.pgen.1000918. 296 [347] Larisa Litovchick, Laurence A. Florens, Selene K. Swanson, Michael P. Washburn, and James A. DeCaprio. Dyrk1a protein kinase promotes 400 quiescence and senescence through dream complex assembly. Genes Dev, 25(8):801–813, Apr 2011. URL: http://dx.doi.org/10.1101/ gad.2034211, doi:10.1101/gad.2034211. 296 [348] Katrin Tschop, Andrew R. Conery, Larisa Litovchick, James A. Decaprio, Jeffrey Settleman, Ed Harlow, and Nicholas Dyson. A kinase shrna screen links lats2 and the prb tumor suppressor. Genes Dev, 25(8):814–830, Apr 2011. URL: http://dx.doi.org/10.1101/gad.2000211, doi:10.1101/gad.2000211. 296 [349] Frederick A. Dick and Joe S. Mymryk. Sweet dreams for hippo. Genes Dev, 25(9):889–894, May 2011. URL: http://dx.doi.org/10. 1101/gad.2050411, doi:10.1101/gad.2050411. 296 [350] J Graham Ruby, Calvin H. Jan, and David P. Bartel. Intronic microrna precursors that bypass drosha processing. Nature, 448(7149):83–86, Jul 2007. URL: http://dx.doi.org/10.1038/nature05983, doi:10.1038/nature05983. 296 [351] D. G. Johnson, J. K. Schwarz, W. D. Cress, and J. R. Nevins. Expression of transcription factor e2f1 induces quiescent cells to enter s phase. Nature, 365(6444):349–352, Sep 1993. URL: http://dx.doi.org/ 10.1038/365349a0, doi:10.1038/365349a0. 296 [352] T. F. Kowalik, J. DeGregori, J. K. Schwarz, and J. R. Nevins. E2f1 overexpression in quiescent fibroblasts leads to induction of cellular dna synthesis and apoptosis. J Virol, 69(4):2491–2500, Apr 1995. 296 ˜ [353] Mary Truscott, Abul B M M K. Islam, NAo ria Lopez-Bigas, and Maxim V. Frolov. mir-11 limits the proapoptotic function of its host gene, de2f1. Genes Dev, 25(17):1820–1834, Sep 2011. URL: http:// dx.doi.org/10.1101/gad.16947411, doi:10.1101/gad. 16947411. 297 [354] William R. Sellers. A blueprint for advancing genetics-based cancer therapy. Cell, 147(1):26–31, Sep 2011. URL: http://dx. doi.org/10.1016/j.cell.2011.09.016, doi:10.1016/j. cell.2011.09.016. 300 401 VII. B IBLIOGRAPHY