Description

This track shows measurements of evolutionary conservation from the Zoonomia Project 's 241-way mammalian species Cactus alignment, referenced to the Bos taurus Btau_5.0.1 (GCA_000003205.6) assembly.

The base-wise conservation scores are computed using phyloP from the PHAST package, for all species. This version was prepared by Michael Dong (Uppsala U) with an improved neutral model incorporating better versions of ancestral repeats.

Display Conventions and Configuration

In full and pack display modes, conservation scores are displayed as a wiggle track (histogramm) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options.

Methods

The Zoonomia alignment was composed of two sets of mammalian genomes: newly assembled DISCOVAR assemblies and GenBank assemblies. The DISCOVAR genomes were masked with RepeatMasker (commit 2d947604), using Repbase version 20170127 as the repeat library and CrossMatch as the alignment engine. The pipeline used is available at repeatMaskerPipeline (commit a6ad966). The guide-tree topology was taken from the TimeTree database (using release current in October 2018), and the branch lengths were estimated using the least-squares-fit mode of PHYLIP, version 3.695. The distance matrix used was largely based on distances from the 4d site trees from the UCSC browser. To add those species not present in the UCSC tree, approximate distances estimated by Mash (commit 541971b) to the closest UCSC species were added to the distance between the two closest UCSC species. We used the HAL package (commit 68db41d) produce the HAL file.

For more details on the alignment used, please follow the Cactus Alignment & Conservation of Zoonomia Placental Mammals (Hg38) track page.

Phylogenetic Tree Model

The phyloP are phylogenetic methods that rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The all-species tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 241-way alignment (msa_view). The 4d sites were derived from the RefSeq (Reviewed+Coding) gene set, filtered to select single-coverage long transcripts.

This same tree model was used in the phyloP calculations; however, the background frequencies were modified to maintain reversibility. The resulting tree model: all species.

PhyloP Conservation

The phyloP program supports several different methods for computing p-values of conservation or acceleration, for individual nucleotides or larger elements ( http://compgen.cshl.edu/phast/). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., the --subtree option was not used). The scores were computed by performing a likelihood ratio test at each alignment column (--method LRT), and scores for both conservation and acceleration were produced (--mode CONACC).

References

Zoonomia:

Zoonomia Consortium.. A comparative genomics multitool for scientific discovery and conservation. Nature. 2020 Nov;587(7833):240-245. PMID: 33177664; PMC: PMC7759459; DOI: 10.1038/s41586-020-2876-6

Cactus:

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111

Harris RS. Improved pairwise alignment of genomic DNA. Ph.D. Thesis. Pennsylvania State University, USA. 2007.

PhyloP:

Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program., Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005 Jul;15(7):901-13. PMID: 15965027; PMC: PMC1172034; DOI: 10.1101/gr.3577405

Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010 Jan;20(1):110-21. PMID: 19858363; PMC: PMC2798823

Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. DOI: 10.1007/0-387-27733-1_12

Siepel A, Pollard KS, and Haussler D. New methods for detecting lineage-specific selection. In Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB 2006), pp. 190-205. DOI: 10.1007/11732990_17