Ensembl Mouse is based on the April 2007 Mus musculus (strain C57BL/6J) high coverage assembly (NCBI m37, GCA_000001635.18). This assembly is used by UCSC to create their mm9 database.

Mouse Genome Sequencing ConsortiumThe Mouse Genome Sequencing Consortium is a joint project between The Whitehead Institute/MIT Center for Genome Research, The Washington University Genome Sequencing Center, The Wellcome Trust Sanger Institute and EMBL - EBI to provide the Mouse genome sequence to the world. We work closely with other Mouse groups to provide an integrated resource (see below for credits).

There are some major changes in the assembly from version m36; for more details see the NCBI build statistics .

To convert your old data from Mouse assembly m36 to m37, click on 'Manage your data' on any mouse page and select 'Assembly converter' from the left-hand menu.

The genome assembly represented here corresponds to GenBank Assembly ID GCF_000001635.18

Download Mouse genome sequence (FASTA)

The Ensembl mouse automatic gene annotations were vastly improved in release 61 (1 February 2011) by using updated Ensembl genebuild pipeline code and incorporating new data resources which have become available since the last NCBIM37 genebuild (April 2007). The new resources include an updated mouse-specific repeat library, additional RefSeq and Uniprot protein sequence data for annotating the coding regions of protein-coding genes, as well as new cDNAs and ESTs for annotating untranslated regions (UTRs) of protein-coding genes. Extensive data quality checks have been performed to remove gene/transcript models with erroneous structures (e.g. interlocking transcripts with long introns on the same strand) or supported by dubious evidence (e.g. cDNA fragments with short, wrongly annotated open-reading frames). In release 62 (13 April 2011), the Ensembl automatic gene annotations were patched to correct gene/transcript models previously truncated due to the presence of selenocysteine residues (encoded by the UGA codon) in their translations. Since release 62, the Ensembl automatic annotations have been frozen.

As in all previous releases since October 2007, in Ensembl release 64, we provide a combined Ensembl-Vega merged gene set. Specifically, the frozen Ensembl annotations from April 2011 were merged with the latest Vega manual annotations (as of 16 May 2011) at the transcript level. The Vega annotations were generated mainly by the HAVANA team at the Wellcome Trust Sanger Institute. Transcripts from the two annotation sources were merged if they shared the same internal exon-intron boundaries (i.e. had identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Vega source transcripts (regardless of merge status) were included in the final merged gene set. In addition, the gene set was also updated in release 64 to maintain its consistency with the 22147 CCDS gene models at the time of production (19 May 2011).

As a result, the release 64 gene set consists of 37681 genes and 95883 transcripts. Of the 95883 transcripts, 18.00% (17257) were the result of merging Ensembl and Vega annotations, 21.65% (20756) originated from Ensembl, 54.18% (51953) originated from Vega, and a remaining ~6.1% were incorporated from other sources (e.g. immunoglobulin gene segments/transcripts imported from IMGT data). Most of the non-merged Vega transcripts were alternative splice variants and/or non-coding transcripts complementing Ensembl annotations which focus on providing a conservative set of protein-coding genes/transcripts.

Vega logo Additional manual annotation of this genome can be found in Vega


Additional functional genomics data produced by the HEROIC project (High-throughput Epigenetic Regulatory Organisation In Chromatin) is available to download from the Ensembl Projects HEROIC portal.