Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods

Ogilvie, Huw A.; Heled, Joseph; Xie, Dong; Drummond, Alexei J.

doi:10.1093/sysbio/syv118

Quantitative Biology > Populations and Evolution

arXiv:1506.06446 (q-bio)

[Submitted on 22 Jun 2015 (v1), last revised 6 Oct 2015 (this version, v3)]

Title:Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods

Authors:Huw A. Ogilvie, Joseph Heled, Dong Xie, Alexei J. Drummond

View PDF

Abstract:Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behaviour of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.

Subjects:	Populations and Evolution (q-bio.PE)
Cite as:	arXiv:1506.06446 [q-bio.PE]
	(or arXiv:1506.06446v3 [q-bio.PE] for this version)
	https://doi.org/10.48550/arXiv.1506.06446
Journal reference:	Syst Biol (2016) 65 (3): 381-396
Related DOI:	https://doi.org/10.1093/sysbio/syv118

Submission history

From: Huw Ogilvie [view email]
[v1] Mon, 22 Jun 2015 02:27:28 UTC (1,319 KB)
[v2] Tue, 22 Sep 2015 10:06:48 UTC (818 KB)
[v3] Tue, 6 Oct 2015 02:34:27 UTC (819 KB)

Quantitative Biology > Populations and Evolution

Title:Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Populations and Evolution

Title:Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators