Archive Ensembl HomeArchive Ensembl Home

Comparative Genomics

Data available

  • Gene trees are constructed using the canonical (usually the longest one) protein for every gene in Ensembl: proteins are clustered using hcluster_sg based on WU-BLAST scores, and each cluster of proteins is aligned using M-Coffee. Finally, TreeBeST is used to produce a gene tree from each multiple alignment, reconciling it with the species tree to call duplication events. Homologues are deduced from these trees. We also determine gene gain and loss events using the CAFE software. More information→ Tree statistics→
  • ncRNA trees are constructed using gene families represented in RFAM, for which a specific covariance model is provided. For each gene family, we build several trees using secondary structure alignments with INFERNAL and genomic alignments with PRANK. All the trees are merged into a final tree using TreeBeST. Using this final tree we infer orthology and paralogy. We also determine gene gain and loss events using the CAFE software. More information→. Tree statistics→
  • Families are constructed by MCL clustering of all Ensembl proteins, i.e. not only the longest protein. Metazoan proteins from UniProtKB SwissProt and SPTREMBL are added to extend the protein set. More information→
  • Whole genome alignments are performed either pairwise between two species using BlastZ-net or translated Blat analysis, or using multiple species. More information →
  • Ancestral sequences are calculated from multi-species whole genome alignments. More information→
  • Conservation scores and constrained elements are calculated from the whole genome multiple alignments. More information→
  • Syntenies are calculated from the pairwise alignments. More information→
  • Stable IDs are provided for Families and Gene Trees and relate exclusively to the content of the Family or the Gene Tree. More information→

Access

Data can be accessed using the Compara Perl API, BioMart, or comparative genomics pages on the browser. Gene trees can be viewed from any 'Gene' page on the browser, and exported via the control panel and the Jalview plug-in in the pop-ups that appear when clicking on any part of the tree.

The external Java-based tool PhyloWidget can also be used to visualise phylogenetic trees of compara species. An example which includes all the current species for the main Ensembl website, plus a few additional mammalian species of interest, has been created by the Compara team: