Archive Ensembl HomeArchive Ensembl Home

Variant Effect Predictor

About | Web version | Perl script | Data formats | Frequently asked questions

About

Ensembl provides the facility to predict the functional consequences of known and unknown variants using the Variant Effect Predictor (VEP). There are three primary ways to use the functionality of the VEP:

The web version is suitable for users with small volumes of data or those who prefer not to use command-line utilities. The script version is the most flexible of the VEP, and allows users to process large volumes of data using their own compute resources. The API is suitable for perl programmers looking to incorporate features of the VEP into their own code.

The VEP was formerly known as the SNP Effect Predictor, and was published under this name. Please reference the following publication:

McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F.
Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.
BMC Bioinformatics26(16):2069-70(2010)
doi:10.1093/bioinformatics/btq330

Web version

The web version of the VEP can be accessed via the Tools link at the top of each Ensembl web page, or via the "Manage your data" link on any species-specific page.

Upload form

When you reach the VEP web interface, you will be presented with a form to enter your data. Data can be uploaded in one of three ways:

  • File upload - click the "Choose file" button and locate the file on your system
  • Paste file - simply copy and paste the contents of your file into the large text box
  • File URL - point the VEP to a file hosted on a publically accessible address. This can be either a 'http://' or 'ftp://' address.

Data format: You can upload your data in several different formats including: tab-delimited, VCF, HGVS, Pileup etc.

Options: The VEP will run fine with the default options; click the blue "Next" button at the bottom of the panel to continue when you are happy with the options. It is also possible to configure these options:

  • Transcript database - choose either Ensembl or "otherfeatures" (including RefSeq) as the reference transcript set
  • File format - ensure you select the correct file format for your uploaded data
  • Species - ensure you have selected the correct species for your data!
  • Get regulatory region consequences - the VEP can check for overlaps with known regulatory features, and also check if a variation falls in a high information part of a transcription factor binding site.
  • Type of consequences to display - the VEP can output consequences as described by Ensembl, the Sequence Ontology (SO) or the NCBI.
  • Check for existing co-located variants - in species with an Ensembl Variation database, the VEP can check for existing variants co-located with your input - the identifiers of these variants appear in the output
  • Return results in coding regions only - by selecting this checkbox the VEP will filter out any results that do not fall in a protein coding region of a transcript, for example those in the introns or upstream of a transcript.
  • Show HGNC identifier for genes where available - add the HGNC identifier for the overlapping gene to the Extra column of the output (the default output shows only the Ensembl Gene ID e.g. ENSG00000000345).
  • Show Ensembl protein identifiers where available - adds the Ensembl protein identifier for the transcript (e.g. ENSP00000411206) to the Extra column of the output.
  • Show HGVS identifiers for variants where available - adds HGVS nomenclature based on Ensembl (or RefSeq if the Otherfeatures transcript database is selected) stable identifiers to the output. Coding and/or protein sequence names can be added where appropriate.

The following options are currently available for human only.

  • For non-synonymous SNPs, the VEP can provide additional predictions on protein productst using the following external tools (all tools output the prediction term, score or both).
    • SIFT predictions - SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.
    • PolyPhen predictions - PolyPhen is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.

Select output format

After clicking "Next", you are then asked to select either HTML or Text output. Both formats contain the same information:

  • Text format is useful if you wish to use the output as the input for any other tools.
  • HTML presents the same information as the Text format, but formatted for the web with links to genes, transcripts and locations in the Ensembl browser. Links are also provided to genes, transcripts and co-located variations. SIFT and PolyPhen predictions are coloured according to severity, with red representing high severity, green low severity and blue unknown.

Viewing your results in the browser

Any data uploaded via the VEP web tool can be viewed on the Ensembl location view; to view your data, either click a link in the Location column of the HTML output, or switch on the track on location view (click "Configure this page", then "User attached data" on Region in detail view to see uploaded tracks).

Issues with larger datasets

The web interface to the VEP has a hard limit of 750 variants in your uploaded file. However, it is possible that the tool will not work with fewer variants than this, depending on the content of your data and the features you switch on. For example, a relatively small file (e.g. 100 variants) may fail to return results if every variant in the file falls in a different gene and those genes are spread across many chromosomes. Contrastingly, a file containing 500 variants may return results quickly if those variants all fall in just a few genes.

To mitigate this issue, users should consider splitting up their input by chromosome and uploading each chromosome's variants as a separate file. The problem can also be solved by using the VEP script - it is a command line tool, but not as hard to use as you might think! It also offers many more features than the web interface and is a generally much more powerful tool.