What is a genome assembly?

The genome assembly is simply the genome sequence produced after chromosomes have been fragmented, those fragments have been sequenced, and the resulting sequences have been put back together. For more information, see the glossary.

Each species in Ensembl has a reference genome assembly that is produced by an international genome consortium. (Ensembl does not produce genome assemblies.) The reference assembly can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain. This depends on the species. Find the DNA source of each genome sequence in the species home page.

Most assemblies provided to Ensembl are haploid. Some assemblies however come with additional alternate (non-reference) sequence, such as the haplotypic regions in human. These can be viewed in the Ensembl browser where available.

A genome assembly is updated when DNA has been sequenced that allows gaps to be filled. It may also be updated when a new assembling algorithm is released. Assemblies are updated on the order of once every two years, or less often, depending on the species. A new genebuild is performed by Ensembl when we decide to update to the genome assembly or when large amounts of new experimental data become available (for example, RNAseq, cDNA and protein sequences).

Older versions of genomic assemblies can be found in the archive sites.

