Implementation of a novel analytical framework for large-scale genetic data. Extending the genetic architecture of type 2 diabetes beyond common variants

Autor/a

Bonàs Guarch, Sílvia

Director/a

Torrents Arenales, David

Mercader Bigas, Josep Maria

Tutor/a

Orozco, Modesto

Fecha de defensa

2017-01-13

Páginas

235 p.



Departamento/Instituto

Universitat de Barcelona. Facultat de Biologia

Resumen

The major landmark in modern genomic and biological research has been the first survey of the entire human genome. On June 2000 the staging of Bill Clinton along with Craig Venter and Francis Collins extolled how genome science would impact our lives by revolutionizing diagnosis, prevention and treatment for a vast number of human diseases (Collins 2010). Since that, we underwent a breathtaking progress in genome science with the unique conjunction of the development of new technologies such as Next Generation Sequencing (NGS) or genotyping arrays (Collins 2010; Hofker et al. 2014) and extensive data sharing initiatives catalysing new discoveries (Kaye et al. 2009; Collins 2010; Hood and Rowen 2013). To underscore the magnitude of this summit, the first sequence from the Human Genome Project (HGP) took 13 years and several collaborative efforts from a lace of international public research institutions entailing a 3 billion budget (U.S. Department of Energy & Human Genome Project program). Less than a decade later, NGS technologies have been implemented for clinical diagnosis, we entered in the $1,000 genome era, and the last Illumina sequencer, HiSeq X Ten is capable of producing up to 16 human genomes (1.8 terabases of data) in three days (Hayden 2014). The success of NGS led to an astonishing rate of growth of sequence data (Koboldt et al. 2013), which is doubling every seven months (Stephens et al. 2015). A downstream consequence has been the rapid accumulation of the number of sequenced genomes of many vertebrates, invertebrates, fungi, plants and microorganisms enabling tackling evolution and genome function through the rationale of comparative genomics (Collins 2010). In addition, the build-up of sequence data of thousands of human subjects contributed to catalogue the genetic differences between individuals, or also called as genetic variation (Hofker et al. 2014). There are different types of genetic variation but the most abundant are Single Nucleotide Polymorphisms (SNPs) (Stranger et al. 2011), substitutions of single nucleotides. While the HGP reported around 1.4 M of SNPs (Lander et al. 2001) more than 84 M of SNPs have been described in the new phase 3 release of the 1000 Genomes Project (1000G-Phase3) (Sudmant et al. 2015; The 1000 Genomes Project Consortium et al. 2015). A final example to illustrate the large efforts invested in more accurate descriptions of genetic variation is the last work published from the Exome Aggregation Consortium (ExAC). This study involved the aggregation and analysis of exomic regions through sequencing data of 60,706 individuals (Lek et al. 2016). The disposal of this kind of data showed a widespread mutational recurrence in human genomes, it allowed detecting genes subjected to strong selection depending on the class of mutation and it is expected to facilitate the clinical interpretation of disease-causing variants (Lek et al. 2016). Thus, the accumulation of individual genetic data has empowered researchers to unravel those specific genetic variants associated with disease liability. We also moved from biologically guided candidate single gene-studies involving a few hundreds of individuals towards hypothesis-free genome-wide analysis, performing extensive and massive genomic interrogation of thousands of individuals (Relling and Evans 2015; Wang et al. 2015). Piecing these advances all together, we have expanded our understanding of disease pathophysiology. Therefore, integrating the genetic understanding of the health-status alongside with clinical explorations constitutes the idea beneath personalized medicine. This genomic paradigm shift for clinical medicine provides a new source of therapeutic breakthroughs and diagnosis (Hood and Rowen 2013). As an example of this, targeted therapeutics have been resourceful for the treatment of lung cancer: sequence information revealed that tumours carrying specific mutations in the epidermal growth factor receptor (EGFR) were vulnerable to kinase inhibitors, resulting in higher response rates compared to traditional platinum-based chemotherapy (Levy et al. 2012; Swanton and Govindan 2016). Moreover, genetic tests are able to predict which breast cancer patients will benefit from chemotherapy (Innocenti et al. 2011; Gyorffy et al. 2015). Finally, notable successes have been achieved in pharmacogenomics, in which warfarin dose can be adjusted on the basis of genetic polymorphisms placed in CYP2C8 and VKORC1C genes (Collins 2010; Hood and Rowen 2013; Relling and Evans 2015). In line with this, there are large efforts under way to prioritize targeted therapeutics and to optimize drug selection and dosing, such as the Genomics England 100,000 Genomes Project and the US National of Health (NIH) Pharmacogenomics Research Network (Relling and Evans 2015; Wilson and Nicholls 2015). However, clear successes in clinical decision-making through genomic knowledge are anecdotal due to a poor understanding of human genetic diseases (Hofker et al. 2014; Relling and Evans 2015). For instance, Genome Wide Association Studies (GWAS) is undoubtedly one of the most important methodological advances emerging from the availability of complete human genome sequences and affordable DNA chips (Visscher et al. 2012; Hofker et al. 2014; Paul et al. 2014). GWAS have been extremely resourceful in identifying genetic variants associated with multiple diseases, but the translation of these results to clinics is sparse (Manolio et al. 2009; Collins 2010; Hofker et al. 2014). Some of the limitations lie on (1) the still small proportion of disease causing genetic factors identified for most complex diseases and (2) a lack of functional characterization and interpretation of disease associated variants, which hampers the identification of the underlying molecular mechanism (Manolio et al. 2009; Hofker et al. 2014). The genomic revolution has brought new decisive players for the future trend in biomedical research and clinical genetics. The ‘genomical’ challenge is one of the most demanding Big Data sciences in all four big computer science domains (data acquisition, storage, distribution and computation). In order to meet this rapid progress of genomic research, the build-up of whole-genome sequences and the emergence of large population biobanks (Stephens et al. 2015) urges a parallel development of computational frameworks. Moreover, a real social concern about data privacy can discourage the participation in genetic studies, which requires a major discussion about the ethical consequences of the return of information to participants seeking for genetic diagnosis (Hood and Rowen 2013; Koboldt et al. 2013). From this brief overview, the agenda of human genomics has clearly many issues to address. In this thesis I translated some of them into the following general goal: setting a cost-effective genetic research environment through the implementation of novel analytical and computational methods in order to better understand the genetics of Type 2 Diabetes (T2D). This work is a small glimpse of the frenzied activity in human genomics research and it aims to modestly contribute along with countless research efforts on this broad deployment of P4 medicine (Predictive, Preventive, Personalized, Participatory). In the next sections of this dissertation, I want to spell out this primary focus by providing several concepts that I learned during these years, which prompted this research to successfully achieve the goals of this thesis.

Palabras clave

Genòmica; Genómica; Genomics; Diabetis no-insulinodependent; Diabetes tipo 2; Non-insulin-dependent diabetes

Materias

575 - Genética general. Citogenética general. Inmunogenética. Evolución. Filogenia

Área de conocimiento

Ciències Experimentals i Matemàtiques

Documentos

SBG_PhD_THESIS.pdf

64.08Mb

 

Derechos

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/4.0/
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/4.0/

Este ítem aparece en la(s) siguiente(s) colección(ones)