Detection and classification of somatic structural variants, and its application in the study of neuronal development

Author

Planas Fèlix, Mercè

Director

Torrents Arenales, David

Tutor

Gelpí Buchaca, Josep Lluís

Date of defense

2020-10-05

Pages

277 p.



Department/Institute

Universitat de Barcelona. Facultat de Biologia

Abstract

The identification and analysis of genomic variation across individuals has been central in biology, first through comparative genomics to answer evolutionary questions, and then in the context of biomedicine, where it is actually becoming central to the study of most diseases. Next generation sequence technologies are allowing the systematic analysis of thousands of different types of genetic variation, enhancing the identification of disease markers and the understanding of the molecular basis of disease. For the past years, there has been a burst of new methodology for genome analysis around diseases coming from hundreds of groups around the world. Specific computational methods and strategies are being designed and improved around the identification and interpretation of genomic variation. The identification and classification of different types of genomic variants in the context of biomedicine is a key and foundational step for the development of a personalized medicine. This has been particularly central in the field of cancer genomics, which has based the research of the past ten to fifteen years in the sequencing of genomic DNA, and the identification and interpretation of (mostly) somatic and germline variation. Throughout these years, a large number of methods for variant detection have been developed with different action ranges. Despite all these developments, the identification of genomic variants has still room for improvement, not only at the level of sensitivity and specificity, but also at the computational level. Given the emergence of many initiatives for personalized medicine around the world, and the expected number of genomes that will have to be analyzed within health care systems, we require robust algorithms, designed together with a matching implementation that will minimize the computational costs of the analysis. With this aim, during this thesis, I have pushed and designed and implemented an algorithm for the efficient processing of genomic data, in close collaboration with computer scientists of our center that defined the implementation, focusing on lowering the energy and the time of the analysis. This methodology, which relies on a reference free approach of read classification, has been protected with a patent, and is being used as the foundation for the development of SMuFin2, a more accurate and computationally efficient version of the initial SMuFin from 2014. We here show that our method is able to process whole genome sequences very fast and with a minimal energy consumption, compared with existing methods, and that has great potential for the identification of all ranges of variants, including insertions of non-human DNA. Further developments on SMuFin2 are needed to finally assess its full variant calling capabilities. Despite their great importance and their clear role in the biology of the cell, somatic variation that occurs in healthy tissues has remained diffuse in their roles. In the case of development, some hypotheses have been proposed to explain the observed somatic DNA damage that occurs during brain development (e.g., replication stress). But the real impact and the underlying mechanisms of this somatic variation are not yet understood. In order to seed light on the type and potential functional impact of somatic variation in brain development, we established a new collaboration to identify, and describe somatic DNA rearrangements induced by Pgbd5 during brain development and adult state in 36 mice neural tissue samples. The detection of somatic variants in healthy tissues presents more challenges than in the cancer scenario, where a variant is present in a significant number of cells and is easier to detect. We have identified, classified and interpreted the landscape of somatic variation in neural development and identified interesting differences between adult and embryonic variation load, and specific types of variants, as the potential result of the activity of these transposase-like genes.

Keywords

Càncer; Cáncer; Cancer; Genètica humana; Genética humana; Human genetics; Bioinformàtica; Bioinformática; Bioinformatics; Genòmica; Genómica; Genomics

Subjects

575 - General genetics. General cytogenetics

Knowledge Area

Ciències Experimentals i Matemàtiques

Note

Tesi realitzada al Centre de Supercomputació de Barcelona (BSC) / Programa de Doctorat en Biomedicina

Documents

MPF_PhD_THESIS.pdf

52.73Mb

 

Rights

ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

This item appears in the following Collection(s)