TY - JOUR KW - Base Composition KW - Chi-Square Distribution KW - Codon KW - Computer Simulation KW - Evolution, Molecular KW - Genes, Protozoan KW - Models, Genetic KW - Models, Statistical KW - Mutation KW - Plasmodium KW - Selection, Genetic KW - Sequence Alignment AU - Yap VB AU - Lindsay H AU - Easteal S AU - Huttley G AB -

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

BT - Molecular biology and evolution C1 - http://www.ncbi.nlm.nih.gov/pubmed/19815689?dopt=Abstract DA - 2010 Mar DO - 10.1093/molbev/msp232 IS - 3 J2 - Mol. Biol. Evol. LA - eng N2 -

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

PY - 2010 SP - 726 EP - 34 T2 - Molecular biology and evolution TI - Estimates of the effect of natural selection on protein-coding content. VL - 27 SN - 1537-1719 ER -