The Nile tilapia (Oreochromis niloticus) genome produced by the Broad Institute.Illumina technology was used to produced this high quality draft. The whole genome shotgun data was assembled with Allpaths-LG. It is composed of 77578 contigs with an N50 value of 29.5kb and 13517 scaffolds with an N50 value of 2.8Mb.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000188235.1

Download Tilapia genome sequence (FASTA)


The gene set was built using a mixed approach. First the Ensembl pipeline was used to generate 195657 models from orthologous vertebrate proteins from UniprotKB with a protein existence level of 1 or 2. Then, due to the lack of species specific sequences and the availability of RNA-Seq for tilapia, we used 700M paired-end reads sequenced by the Broad Institute. The RNA-Seq data contains 11 tissue types: blood, brain, embryo, eye, heart, kidney, liver, muscle, ovary, skin and testis. We pooled the tissues to avoid creating too many fragmented models. Using the RNA-Seq pipeline, we created 40899 models from the pooled set. By combining the orthologous set, the RNA-Seq set and our ncRNA pipeline we built the final gene set: 21437 protein coding gene models, 22 pseudogenes, 3 retrotransposed and 821 non coding RNA.

RNA-Seq data set. In addition to the main set, we have predicted gene models for each tissue type using the RNA-Seq pipeline. We did a BLASTp of these models against UniProt proteins of protein existence level 1 and 2 in order to confirm the open reading frame (ORF). The best BLAST hit is displayed as a transcript supporting evidence.

The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:

tissue no. gene models
blood 18777
brain 26171
embryo 27891
eye 26200
heart 25109
kidney 26097
liver 20663
muscle 22738
ovary 28311
skin 24850
testis 33377