Archive Ensembl HomeArchive Ensembl Home

Custom data sets

If you want to filter or customise your download, please try Biomart, a web-based querying tool.

FTP Download

API Code

If you do not have access to CVS, you can obtain our latest API code as a gzipped tarball:

Download complete API for this release

Note: the API version needs to be the same as the databases you are accessing, so please use CVS to obtain a previous version if querying older databases.

Database dumps

Entire databases can be downloaded from our FTP site in a variety of formats. Please be aware that some of these files can run to many gigabytes of data.

Looking for MySQL dumps to install databases locally? See our web installation instructions for full details.

Each directory on ftp.ensembl.org contains a README file, explaining the directory structure.

Multi-species data

Database
Comparative genomicsMySQLEMFBEDXMLAncestral Alleles
BioMartMySQL----

Single species data

SpeciesDNA (FASTA)cDNA (FASTA)ncRNA (FASTA)Protein sequence (FASTA)Annotated sequence (EMBL)Annotated sequence (GenBank)Gene setsWhole databasesVariation (EMF)Variation (GVF)Variation (VEP)Regulation (GFF)Data filesBAM
Ailuropoda melanoleuca (Panda)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Anolis carolinensis (Anole lizard)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Bos taurus (Cow)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVFVEP---
Caenorhabditis elegans (Caenorhabditis elegans)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Callithrix jacchus (Marmoset)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Canis familiaris (Dog)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Cavia porcellus (Guinea Pig)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Choloepus hoffmanni (Sloth)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Ciona intestinalis (C.intestinalis)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Ciona savignyi (C.savignyi)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Danio rerio (Zebrafish)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVFVEP---
Dasypus novemcinctus (Armadillo)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Dipodomys ordii (Kangaroo rat)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Drosophila melanogaster (Fruitfly)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Echinops telfairi (Lesser hedgehog tenrec)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Equus caballus (Horse)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Erinaceus europaeus (Hedgehog)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Felis catus (Cat)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Gadus morhua (Cod)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Gallus gallus (Chicken)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Gasterosteus aculeatus (Stickleback)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Gorilla gorilla (Gorilla)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Homo sapiens (Human)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQLEMFGVFVEPRegulation (GFF)Regulation data files-
Latimeria chalumnae (Coelacanth)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Loxodonta africana (Elephant)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Macaca mulatta (Macaque)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Macropus eugenii (Wallaby)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Meleagris gallopavo (Turkey)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Microcebus murinus (Mouse Lemur)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Monodelphis domestica (Opossum)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Mus musculus (Mouse)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQLEMFGVFVEPRegulation (GFF)Regulation data files-
Myotis lucifugus (Microbat)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Nomascus leucogenys (Gibbon)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Ochotona princeps (Pika)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Oreochromis niloticus (Tilapia)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-----BAM
Ornithorhynchus anatinus (Platypus)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Oryctolagus cuniculus (Rabbit)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Oryzias latipes (Medaka)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Otolemur garnettii (Bushbaby)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Pan troglodytes (Chimpanzee)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF---BAM
Petromyzon marinus (Lamprey)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Pongo abelii (Orangutan)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Procavia capensis (Hyrax)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Pteropus vampyrus (Megabat)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Rattus norvegicus (Rat)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQLEMFGVFVEP---
Saccharomyces cerevisiae (Saccharomyces cerevisiae)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Sarcophilus harrisii (Tasmanian devil)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Sorex araneus (Shrew)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Spermophilus tridecemlineatus (Squirrel)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Sus scrofa (Pig)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF---BAM
Taeniopygia guttata (Zebra Finch)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Takifugu rubripes (Fugu)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Tarsius syrichta (Tarsier)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Tetraodon nigroviridis (Tetraodon)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL-GVF----
Tupaia belangeri (Tree Shrew)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Tursiops truncatus (Dolphin)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Vicugna pacos (Alpaca)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------
Xenopus tropicalis (Xenopus)FASTAFASTAFASTAFASTAEMBLGenBankGTFMySQL------

To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.

About the data

The following types of data dumps are available on the FTP site.

FASTA
FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Each directory has a README file with a detailed description of the header line format and the file naming conventions.
DNA
Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).
The header line in an FASTA dump files containing DNA sequence consists of the following attributes : coord_system:version:name:start:end:strand This coordinate-system string is used in the Ensembl API to retrieve slices with the SliceAdaptor.
cDNA
cDNA sequences for Ensembl or ab initio predicted genes.
Peptides
Protein sequences for Ensembl or ab initio predicted genes.
RNA
Non-coding RNA gene preditions.
Annotated sequence
Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.
EMBL
Ensembl database dumps in EMBL nucleotide sequence database format
GenBank
Ensembl database dumps in GenBank nucleotide sequence database format
MySQL
All Ensembl MySQL databases are available in text format as are the SQL table definition files. These can be imported into to any SQL database for a local installation of a mirror site. Generally, the FTP directory tree contains one one directory per database. For more information about these databases and their Application Programming Interfaces (or APIs) see the API section.
GTF
Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.
EMF flatfile dumps (variation and comparative data)

Alignments of resequencing data are available for several species as Ensembl Multi Format (EMF) flatfile dumps. The accompanying README file describes the file format.

Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the mysql directory.

GVF (variation data)
GVF (Genome Variation Format) is a simple tab-delimited format derived from GFF3 for variation positions across the genome. There are GVF files for different types of variation data (e.g. somatic variants, structural variants etc). For more information see the "README" files in the GVF directory.
BED format files (comparative data)

Constrained elements calculated using GERP are available in BED format. For more information see the accompanying README file.

BED format is a simple line-based format. The first 3 mandatory columns are:

  • chromosome name (may start with 'chr' for compliance with UCSC)
  • start position. This is a 0-based position
  • end position.

More information on the BED file format...

Tarball

The entire Ensembl API is gzipped and concatenated into a single TAR file. This is updated daily.