Analysis of Functional Annotations in Regulatory Elements

  1. García Moreno, Adrián
Dirigée par:
  1. Pedro María Carmona Sáez Directeur

Université de défendre: Universidad de Granada

Fecha de defensa: 06 octobre 2023

Jury:
  1. Florencio Pazos President
  2. M. Coral del Val Muñoz Secrétaire
  3. Martín Garrido Rodríguez Córdoba Rapporteur

Type: Thèses

Résumé

The progress in high-throughput techniques, characterised by enhanced measurement accuracy and affordability, has significantly contributed to our improved comprehension of biological systems at the molecular level. This development has propelled the advancement of omics biomedicine research, specially, facing the current challenges that complex diseases present. However, the high heterogeneity of complex diseases stresses the need of a personalised medicine and the integration of the different layers that regulate biological systems. The general purpose of these studies is to identify biomarkers inspecting the crosstalk between the different molecules that govern the genetic information flow. Commonly, the results of omics data investigation yield large lists of candidate biomarkers. Making sense out of these requires bioinformatics methodologies, particularly, the functional annotations enrichment analysis. It applies a statistical test to evaluate the overrepresentation of biological annotations within a list of biomarkers in comparison to a reference background. While it is a well established methodology for genes and proteins there is a notable lack of tools that enable the exploration of functional implications associated with regulatory elements. This thesis’s general objective is to address the existing gap contributing to the biomedical scientific community with a functional enrichment tool to analyse regulatory elements. After carefully reviewing the state-of-the-art enrichment methodologies for miRNAs, we learnt that miRNAs, as well as CpG methylation islands and transcription factors, have a common method that consists of inferring their functional implications through the annotations associated with their target genes. This is because the predominant functional terms databases are dedicated to genes and the annotations of regulatory elements are mainly describing their natural role and not their downstream functional effect on the target genes. In the concrete case of analysing the associated genes of CpGs and miRNAs, the traditional enrichment method which applies a test based on the central hypergeometric distribution over the associated genes produces biassed results towards specific and related functional terms mainly related with cell cycle, regulation processes and cancer. Current tools propose different solutions for the analysis of miRNAs and CpG islands. For instance, to avoid the traditional approach limitations in miRNAs, direct miRNAs set annotations must be tested which can be obtained either by expert curation or after transforming gene-based annotations to the miRNAs-level. Conversely, a well-established unbiased alternative for CpGs analysis employs the Wallenius noncentral hypergeometric test but, surprisingly, no miRNAs literature hinted about it. Our objective here is focussed on assessing and implementing a novel adaptation of the Wallenius method for the analysis of miRNAs. The novel method and the evaluation of other known methods for the unbiased functional enrichment analysis of regulatory elements has motivated the development of a new GeneCodis version. To fulfil this objective the new version required a complete reengineering of the application. As a result, GeneCodis 4 offers the latest required methods to perform functional enrichment analysis of lists of genes, proteins, CpGs, miRNAs and transcription factors. The update also provides an improvement of the co-annotation discovery algorithm, an expansion of the annotations and organisms database and new interactive visualisations. It is equally accessible for bioinformatics and bench scientists thanks to its implementation as a webtool with an application programming interface. Finally, almost no literature studies the enrichment analysis of transcription factors lists. In this context, the authors of the only tool to perform singular enrichment analysis of transcription factors, TFTenricher, appear to have overlooked the biassed enrichment analysis of regulatory elements. This presented an opportunity for us to demonstrate that the varying number of transcription factors per regulated gene contributes to the constant enrichment of signalling pathways, transcription regulation, cell cycle and cancer terms. Finally, we validated the power of the Wallenius approach in the transcription factors context by means of null simulations and two real cases reanalysis.