Archive Ensembl HomeArchive Ensembl Home

The Ensembl Annotation Process

Protein-coding gene annotation

Protein-coding genes are automatically annotated using Ensembl's genebuild pipeline. All transcripts are based on mRNA and proteins in public scientific databases.

The human gene set is used as the GENCODE gene set. The human and mouse gene sets include all CCDS transcripts.

See the annotation article for more about the Ensembl genebuild pipeline, gene names and annotation.

Low-coverage genomes are annotated using a modified pipeline which attempts to locate genes across multiple scaffolds.

More genes

The Ensembl gene set also includes automatically-annotated pseudogenes and non-coding RNAs. For human and mouse, we include annotation from IMGT for Ig genes.

EST-based genes are predicted and displayed on the website but are not included in the final gene set.

Paired-end Illumina RNA-seq data have been used to generate transcript models for human and zebrafish.

Alternative Splicing

Ensembl includes automatically-annotated Alternative splicing events for model organisms.