.
Assembly

The Nile tilapia (Oreochromis niloticus) genome produced by the Broad Institute.Illumina technology was used to produced this high quality draft. The whole genome shotgun data was assembled with Allpaths-LG. It is composed of 77578 contigs with an N50 value of 29.5kb and 13517 scaffolds with an N50 value of 2.8Mb.
The genome assembly represented here corresponds to GenBank Assembly ID GCA_000188235.1
Download Tilapia genome sequence (FASTA)
Annotation
The gene set was built using a mixed approach. First the Ensembl pipeline was used to generate 195657 models from orthologous vertebrate proteins from UniprotKB with a protein existence level of 1 or 2. Then, due to the lack of species specific sequences and the availability of RNA-Seq for tilapia, we used 700M paired-end reads sequenced by the Broad Institute. The RNA-Seq data contains 11 tissue types: blood, brain, embryo, eye, heart, kidney, liver, muscle, ovary, skin and testis. We pooled the tissues to avoid creating too many fragmented models. Using the RNA-Seq pipeline, we created 40899 models from the pooled set. By combining the orthologous set, the RNA-Seq set and our ncRNA pipeline we built the final gene set: 21437 protein coding gene models, 22 pseudogenes, 3 retrotransposed and 821 non coding RNA.
RNA-Seq data set. In addition to the main set, we have predicted gene models for each tissue type using the RNA-Seq pipeline. We did a BLASTp of these models against UniProt proteins of protein existence level 1 and 2 in order to confirm the open reading frame (ORF). The best BLAST hit is displayed as a transcript supporting evidence.
The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:
tissue | no. gene models |
---|---|
blood | 18777 |
brain | 26171 |
embryo | 27891 | eye | 26200 |
heart | 25109 |
kidney | 26097 |
liver | 20663 |
muscle | 22738 |
ovary | 28311 |
skin | 24850 |
testis | 33377 |
.