:

e.g. BRCA2 or 6:133017695-133161157 or osteoarthritis

Assembly

Homo sapiens This site provides a data set based on the February 2009 Homo sapiens high coverage assembly GRCh37 (GCA_000001405.6) from the Genome Reference Consortium. This assembly is used by UCSC to create their hg19 database. The data set consists of gene models built from the genewise alignments of the human proteome as well as from alignments of human cDNAs using the cDNA2genome model of exonerate.

This release of the assembly has the following properties:

  • 27478 contigs.
  • contig length total 3.2 Gb.
  • chromosome length total 3.1 Gb.

It also includes nine haplotypic regions, mainly in the MHC region of chromosome 6.

As the GRC maintains and improves the assembly, patches are being introduced. Patch release seven (GRCh37.p7) was included in Ensembl release 67. Currently, assembly patches are of two types:

  • Novel patch: new sequences that add alternative sequence at a loci and will remain as haplotypes in the next major assembly release by GRC
  • Fix patch: sequences that correct the reference sequence and will replace the given region of the reference assembly at the next major assembly release by GRC

To convert your old data from Human assembly NCBI36 to GRCh37, click on 'Manage your data' on any human page and select 'Assembly converter' from the left-hand menu.

A preliminary assembly of the Neanderthal (Homo sapiens neanderthalensis) genome is available via the Neanderthal Genome Browser, an Ensembl-powered project based at the Max Planck Institute.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000001405.8

Download Human genome sequence (FASTA)

Previous assemblies

Annotation

The Ensembl human gene annotations have been updated using Ensembl's automatic annotation pipeline. The updated annotation incorporates new protein and cDNA sequences which have become publicly available since the last GRCh37 genebuild (March 2009).

In release 67 (May 2012), we continue to display a joint gene set based on the merge between the automatic annotation from Ensembl and the manually curated annotation from Havana. This refined gene set corresponds to GENCODE release 12. The Consensus Coding Sequence (CCDS) identifiers have also been mapped to the annotations. More information about the CCDS project.

Updated manual annotation from Havana is merged into the Ensembl annotation every release. Transcripts from the two annotation sources are merged if they share the same internal exon-intron boundaries (i.e. have identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Havana transcripts are included in the final Ensembl/Havana merged (GENCODE) gene set. In this release, 23532 Ensembl gene models and 46246 Havana genes were merged together to create the final set of 57891 genes.

Vega logo Additional manual annotation of this genome can be found in Vega