Resolución del problema de alineamiento múltiple de secuencias por medio de computación evolutiva

  1. MATEUS SILVA, FERNANDO JOSÉ
Dirigida por:
  1. Juan Manuel Sánchez Pérez Director/a

Universidad de defensa: Universidad de Extremadura

Fecha de defensa: 07 de julio de 2011

Tribunal:
  1. Pedro Isasi Viñuela Presidente/a
  2. Si­lvio Priem Mendes Secretario/a
  3. Miguel Ángel Vega Rodríguez Vocal
  4. Julio Ortega Lopera Vocal
  5. Juan Antonio Gómez Pulido Vocal

Tipo: Tesis

Teseo: 310200 DIALNET lock_openTESEO editor

Resumen

Despite of being a very common task in Bioinformatics, multiple sequence alignment is not a trivial matter. Arranging a set of molecular sequences to reveal their similarities and their differences is often hardened by the complexity and the size of the search space involved, which undermine the approaches that try to explore exhaustively the solution's search space. Although exact methods for computing multiple sequence alignment exist, they are often impractical due to the demanding computer resources needed for aligning a considerable number of sequences. Therefore, alternative strategies that allow a better exploration of the search space, such as stochastic iterative methods, are an alternative for accomplishing the alignment of a set of molecular sequences. Because of its nature, Genetic Algorithms, which are prone for general combinatorial problems optimization in large and complex search spaces, emerge as serious candidates to tackle with multiple sequence alignment. This work focuses on solving multiple sequence alignment by means of Evolutionary Computation. For that purpose, an Evolutionary Algorithm, which consists on a Genetic Algorithm using novel genetic operators with embedded local search optimization, has been developed. These new operators represent an improvement in this area, and their combined use can help the algorithm to reach for quality solutions, comparable to those computed by existing methods. The use of parallel techniques for speeding up the search and improving the quality of the found solutions has been also in the aim of this investigation, leading to the Parallel AlineaGA and Parallel Niche Pareto AlineaGA algorithms. These versions make use of the Message Passing Interface for communication and are ready for being deployed in a Microsoft Windows HPC Server 2008 computer cluster. All algorithm versions have been tested using BAliBASE datasets, and their results have been compared with the ones found by two of the most widely used and representative multiple sequence alignment tools: ClustalW and T-Coffee. This has allowed observing that the parallel versions of AlineaGA can lead the search for better solutions on the majority of the test datasets, and as such, they represent a valid contribution to this field. However, one must not forget that biological reasoning is still mandatory in evaluating the results provided by multiple sequence alignment programs, because even evolution capable simulating machines don't do biology.