# README_phyloP_bosTaurus_241MAMMALSv2_MTDF.txt # Author: Michael Dong # Date: 23-12-20 # Description: This dataset contains the PhyloP scores calculated on a cattle-referenced (Bos taurus) 241 Mammals MAF-formatted alignment, with alignment duplicates filtered out by mafTools. Autosomes, chrX and chrY were processed using different neutral models generated independently from human "ancestral" repeat coordinates. File: Bostaurus_10Mb_241MAMMALS_MTDF.PhyloP.BEDscores.tar # Alignment: Alignment used: 241-mammalian-2020v2.nh Reference: Bos_taurus (Btau_5.0.1, https://www.ncbi.nlm.nih.gov/assembly/GCA_000003205.6) Chromosomes : from CM000177.6 (chr1) to CM000205.6 (chr29), CM000206.6 (chrX), CM001061.2 (chrY) Species selected: All Split: Split by 10Mb alignment segments Number of species: 241 Duplicates: filtered out / mafTools mafDuplicateFilter (best hit vs consensus) # Model files - PhyloFit For each type (autosomes, X, Y), a random set of repeats were selected from the human repeatmasker data. The repeats were collected from human Repeatmasker available on UCSC, then only the regions that overlaps mouse (Mm39) and armadillo (DasNov2) Chains were selected. The coordinates were then intersected with the coordinates from old alignment to check whether they had at least 200 species aligned 100000 repeat positions were then randomly selected on autosomes, chrX and chrY each. PhyloFit were then run on each of those sets independently. PhyloFit parameters: --subst-mod REV --EM Tree used: 241-mammalian-2020v2.nh (http://cgl.gi.ucsc.edu/data/cactus/241-mammalian-2020v2.phast-242.nh) 3 models: cactus_200m_autosomes_V2_mdong.mod, cactus_200m_chrX_V2_mdong.mod, cactus_200m_chrY_V2_mdong.mod # Conservation scoring - PhyloP: The PhyloP scoring is run on the whole alignment, on the duplicate-filtered 10Mb split version Model(s): cactus_200m_autosomes_V2_mdong.mod, cactus_200m_chrX_V2_mdong.mod, cactus_200m_chrY_V2_mdong.mod PhyloP parameters: --method LRT --mode CONACC --wig-scores (UCSC) #Output Number of files : 290 Total size: 25 Gb for BED Format : BED File name format: .._scoresPhyloP_250.wig.bed