Archive Ensembl HomeArchive Ensembl Home

RNA-seq Gene Models

Human RNA-seq Gene Models

RNA-seq data from Illumina's Human BodyMap 2.0 project have been used to generate gene models for human. The data, generated on HiSeq 2000 instruments in 2010, consist of 16 human tissue types including adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells. For each tissue, we have aligned the raw reads to the genome and then linked exons into tissue-specific transcript models using the reads that span an exon-exon boundary.

You can view these data in the Region in Detail view. Click on `Configure this page' and choose `RNA-Seq' at the left of the main panel. Enable any or all of the 32 tracks and then close the configuration panel. Out of 32 possible tracks you can draw, 16 are tissue `gene model' tracks, and 16 are `intron' tracks.

The `gene model' track shows a transcript model. The `intron' track shows how many raw reads aligned across an exon-exon junction. The higher the intron block, the more highly expressed the transcript isoform is. When read coverage is high, the transcript's exon-intron structure produced for the gene track has a good chance of being correct. When read coverage is very low, it is not always possible to build a full-length transcript model.

Zebrafish RNA-seq Gene Models

This is an experimental set of gene models produced using paired end Illumina RNA-seq data from the Wellcome Trust Sanger Institute Zebrafish Transcriptome Sequencing Project Ref: ERP000016.

The models are produced from a 2 step alignment process using Exonerate. First, a local genomic alignment is created that is collapsed to create alignment blocks roughly corresponding to exons. Read pairing information is used to group exons into approximate transcript structures. Secondly, reads are realigned to the proto-transcripts using a splice model and a short word length to create a set of spliced alignments representing canonical and non-canonical introns.

Gene models are created by combining the proto-transcripts with the spliced reads to create all possible variants, the variant with the most read support is displayed.

Intron Supporting Features represent the collapsed set of spliced reads used to inform the gene models. The features show the number of reads from each tissue that confirm a particular intron. Not all introns show expression in all tissues. Also, not all of the introns features were used in the gene models shown.

The intron track can be configured to use a variable height display where the height of the feature varies in accordance with the number of reads supporting the intron (up to a maximum of 50 reads). This display also highlights non-canonical splices in red.