Regiones genómicas implicadas en la metilación diferencial del ADN

  1. Barturen Briñas, Guillermo
Supervised by:
  1. Michael Hackenberg Director
  2. José Lutgardo Oliver Jiménez Director

Defence university: Universidad de Granada

Fecha de defensa: 09 June 2014

Committee:
  1. Rafael Lozano Ruiz Chair
  2. Francisco Perfectti Álvarez Secretary
  3. Carmelo Ruiz Rejón Committee member
  4. Pedro A. Bernaola Galván Committee member
  5. Fuencisla Matesanz del Barrio Committee member
Department:
  1. GENÉTICA

Type: Thesis

Abstract

ABSTRACT (English) Introduction DNA methylation presents the most characteristic traits of epigenetic marks. Basically, it consist in a covalent bond between a methyl group and the cytosines¿ fifth carbon, which will result on methylcytosines, named by some authors as the ¿fifth base of the DNA¿1. In mammals, methylation is almost restricted to the CpG dinucleotides, which appears methylated all over the genome2. CpG islands (CGIs) are the exception to this CpG global methylation. CGIs are chromatin opened, high CpG density and methylation free regions, which allow the interaction with the DNA3, 4. Almost 70% of human genes present CGIs, including all housekeeping genes and 40% of tissue-specific genes5, 6. DNA methylation could vary between tissues, individuals or cellular types. The underlying mechanism is the regulation of the protein interaction with the DNA. Basically, DNA methylation participates in gene transcription and keeps genome stability. However, these functions could be regulated on different ways: * Transcription regulation: o Regulates the RNA polymerase complex interaction with the promoters7. o Participates in splicing mechanisms8. o Regulates the transcription factors binding to enhancers9 and insulators10. * Genome stability: o Silences repetitive elements as retrotransposons11. o Participates in the dosage compensation of sexual chromosomes12 and in the imprinting13 of autosomic genes. Taking into account the multiple mechanisms where the DNA methylation participates, it is expected that aberrant methylation patterns will be associated with multiple diseases14, including multiple types of cancers. Objectives The main goal of this Thesis is the identification and characterization of differentially methylated regions, which could serve as epigenetic markers. In order to reach this objective, it is needed to obtain high-quality genome-wide methylation maps (methylomes) from multiple tissues. This is the only way to perform a comparative study and identify those differentially methylated regions. In addition, it is known that the CpG density plays an important role during transcription factors (TFs) binding and DNA methylation state determination15, 16. However, neither the available tools nor the studies carried out for multiple tissues take into account this important feature. So, another goal of this Thesis has been to improve the CpGcluster algorithm17, resulting on a new one (WordCluster18), which is able to identify CGIs statistically significant and with a high CpG density. Theoretical developments and Results The first step was to develop bioinformatics tools to obtain high-quality methylomes from Next-Generation bisulfite treated Sequencing raw datasets: NGSmethPipe19 and MethylExtract20. The first software was developed to preprocess and align the reads against the reference genome, while the second one infers the methylation levels for single cytosines and the Single Nucleotide Variants (SNVs) from the mapped reads. Both of them include strict quality controls, in order to obtain comparable methylomes from different tissues, species or pathological samples. In addition, it was necessary to develop a relational database (NGSmethDB21, 22) to store, manage and take advantage of all the produced methylomes. Currently, NGSmethDB contains methylomes for 6 species and 114 tissues and/or different conditions; in order to obtain those methylomes, 40 terabytes of reads were processed. The new developed algorithm, WordCluster, presents important advantages compared to the classic algorithms used to identify CGIs. The most important ones are: its ability to delimit shorter methylation domains but statistically significant and more homogeneous, and that its predictions presents more specific association with regulatory elements as well as with conserved regions of the genome23. The statistical study developed about the methylation differences between tissues on the CGIs predicted by WordCluster demonstrates that the best method to identify DMIs (Differentially Methylated CpG Islands) is based on the combination of the Fisher¿s exact test and the negative binomial. Most of the DMIs can be classified into two classes: DMIs-M (methylated on almost all the tissues and unmethylated on few ones) and DMIs-U (methylated on few tissues and unmethylated on the rest of them). The functional analysis based on GO (Gene Ontology) terms shows that DMIs-U used to regulate development and cell differentiation processes, while DMIs-M are associated with tissue-specific functions. Finally, the enrichment studies of the DMIs within regulatory elements had revealed important differences with other DMRs (Differentially Methylated Regions), recently identified irrespective the CpG density. The most important differences are that DMIs are highly enriched on promoters and exons but depleted on introns, as well as the significantly less proportion of DMIs-M associated with TFBSs compared with DMRs. It is also remarkable that the DMIs-U presents a higher overlapping ratio with TFBSs than the DMIs-M. All these important features found on the analysed DMIs, suggest that WordCluster could be the algorithm of choice in order to preselect interesting regions on future differentially methylation studies. The Thesis is available on: http://bioinfo2.ugr.es/Publicaciones/tesis_GB.pdf