Supplementary data of the paper 'Adaptive trends of sequence compositional complexity over pandemic time in the SARS CoV 2 coronavirus'

  1. Oliver, José L. 1
  2. Galván, Pedro Bernaola 2
  3. Perfectti, Francisco 1
  4. Martín, Cristina Gómez 1
  5. Castiglione, Silvia 3
  6. Raia, Pasquale 3
  7. Verdú, Miguel 4
  8. Moya, Andrés 5
  1. 1 Department of Genetics, Faculty of Sciences, University of Granada, 18071, Granada, Spain
  2. 2 Department of Applied Physics II and Institute Carlos I for Theoretical and Computational Physics, University of Málaga, 29071, Málaga, Spain
  3. 3 Dipartimento di Scienze della Terra, dell'Ambiente e delle Risorse, Università di Napoli Federico II, 80126, Napoli, Italy
  4. 4 7Centro de Investigaciones sobre Desertificación, Consejo Superior de Investigaciones Científicas (CSIC), University of València and Generalitat Valenciana, 46113, Valencia, Spain
  5. 5 Institute of Integrative Systems Biology (I2Sysbio), University of València and Consejo Superior de Investigaciones Científicas (CSIC), 46980, Valencia, Spain

Publisher: Zenodo

Year of publication: 2023

Type: Dataset

DOI: 10.5281/ZENODO.6650870 GOOGLE SCHOLAR lock_openOpen access editor

Abstract

Supplement of the paper<br> "Adaptive trends of sequence compositional complexity over pandemic time in the SARS-CoV-2 coronavirus”<br> During the spread of the COVID-19 pandemic, the SARS-CoV-2 coronavirus underwent mutation and recombination events that altered its genome compositional structure, thus providing an unprecedented opportunity to check an evolutionary process in real time. The mutation rate is known to be lower than expected for neutral evolution, suggesting natural selection and convergent evolution. We begin by summarizing the compositional heterogeneity of each viral genome by computing its Sequence Compositional Complexity (SCC). To analyze the full range of SCC diversity, we select random samples of high quality coronavirus genomes covering the full span of the pandemic. We then search for evolutionary trends that could inform us on the adaptive process of the virus to its human host by computing the phylogenetic ridge regression of SCC against time (i.e., the collection date of each viral isolate). In early samples, we find no statistical support for any trend in SCC values, although the viral genome appears to evolve faster than Brownian Motion (BM) expectation. However, in samples taken after the emergence of high fitness variants, and despite the brief time span elapsed, a driven decreasing trend for SCC and an increasing one for its absolute evolutionary rate are detected, pointing to a role for selection in the evolution of SCC in the coronavirus. We conclude that the higher fitness of variant genomes may have leads to adaptive trends of SCC over pandemic time in the coronavirus. Supplementary files <strong>File</strong> <strong>Description</strong> SupplementaryTables S1-S19.zip Excel supplementary tables: The strain name, the collection date, and the SCC values for each analyzed genome. nextstrain_ncov_open_global_timetree.nwk ML phylodynamic tree for the Nextstrain sample SupplementaryTable S20.pdf A complete list acknowledging the authors, originating and submitting laboratories of the genetic sequences we used for the analysis of the Nextstrain sample. Nextstrain_sample_fasta_3059.zip Nextstrain sample (sequences in Fasta format) PhylogeneticTimetrees_NewickFormat.zip Phylogenetic timetrees (Newick format).