The origins of apicomplexan sequence innovation

  1. James Wasmuth1,6,
  2. Jennifer Daub1,5,
  3. José Manuel Peregrín-Alvarez1,2,
  4. Constance A.M. Finney3 and
  5. John Parkinson1,4,6
  1. 1 Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario M5G 2L3, Canada;
  2. 2 Department of Molecular Biology and Biochemistry, University of Malaga, 29071 Malaga, Spain;
  3. 3 McLaughlin-Rotman Centre for Global Health, McLaughlin Centre for Molecular Medicine, University of Toronto, Toronto, Ontario M5G 2C4, Canada;
  4. 4 Departments of Biochemistry and Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada

    Abstract

    The Apicomplexa are a group of phylogenetically related parasitic protists that include Plasmodium, Cryptosporidium, and Toxoplasma. Together they are a major global burden on human health and economics. To meet this challenge, several international consortia have generated vast amounts of sequence data for many of these parasites. Here, we exploit these data to perform a systematic analysis of protein family and domain incidence across the phylum. A total of 87,736 protein sequences were collected from 15 apicomplexan species. These were compared with three protein databases, including the partial genome database, PartiGeneDB, which increases the breadth of taxonomic coverage. From these searches we constructed taxonomic profiles that reveal the extent of apicomplexan sequence diversity. Sequences without a significant match outside the phylum were denoted as apicomplexan specialized. These were collated into 9134 discrete protein families and placed in the context of the apicomplexan phylogeny, identifying the putative origin of each family. Most apicomplexan families were associated with an individual genus or species. Interestingly, many genera-specific innovations were associated with specialized host cell invasion and/or parasite survival processes. Contrastingly, those families reflecting more ancestral relationships were enriched in generalized housekeeping functions such as translation and transcription, which have diverged within the apicomplexan lineage. Protein domain searches revealed 192 domains not previously reported in apicomplexans together with a number of novel domain combinations. We highlight domains that may be important to parasite survival.

    Footnotes

    • 5 Present address: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    • 6 Corresponding authors.

      E-mail jwasmuth{at}sickkids.ca; fax (416) 813-5022.

      E-mail jparkin{at}sickkids.ca; fax (416) 813-5022.

    • [Supplemental material is available online at www.genome.org and at www.compsysbio.org/projects/apicomparison.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.083386.108.

      • Received July 16, 2008.
      • Accepted March 31, 2009.
    | Table of Contents

    Preprint Server