Exact and efficient algorithms for the probability of a marker under incomplete lineage sorting
Stockholm Bioinformatics Center Seminars
Friday 15 August 2008
to 12:00 at
David Bryant (McGill Centre for Bioinformatics)
Incomplete lineage sorting is known to complicate phylogenetic analysis of species radiations. Lineages from the same species can coalesce before the time of species divergence, leading to gene trees that are in conflict with the species tree. The standard models for the evolution of markers on a gene tree and for gene trees coalescing within species trees are computationally demanding since one has to integrate over all possible gene trees at each unlinked locus.
We have developed algorithms that avoid this integration over gene trees by using a variant of Felsenstein's pruning algorithm for the likelihood of a phylogeny. Given a species tree (with divergence dates and population sizes) we can compute the probability of a single binary marker, exactly and efficiently. Both finite site and infinite site models of mutation are handled. Thus, if the data consist of a collection of unlinked binary markers (such as SNP data) we can compute the likelihood of the species tree directly, bypassing the need to consider the gene tree histories. These likelihoods can then be used for Bayesian or ML inference on the species tree and its parameters.