Archive Ensembl HomeArchive Ensembl Home

Frequently asked questions

About | Web version | Perl script | Data formats | Frequently asked questions

General questions

Q: Why don't I see any co-located variations when using species X?

A: Ensembl only has variation databases for a subset of all Ensembl species - see this document for details.


Q: Why has my insertion/deletion variant encoded in VCF disappeared from the VEP output?

A: Ensembl treats unbalanced variants differently to VCF. You can solve this by giving your variants a unique identifier in the third column of the VCF file. See here for a full discussion.


Q: Why do I see so many lines of output for each variant in my input?

A: While it can be convenient to search for a easy, one word answer to the question "What is the consequence of this variant?", in reality biology does not make it this simple! Many genes have more than one transcript, so the VEP provides a prediction for each transcript that a variant overlaps. The VEP script can help here; the --canonical and --ccds options indicate which transcripts are canonical and belong to the CCDS set respectively, while --per_gene, --summary and --most_severe allow you to give a more summary level assessment per variant.

Furthermore, several "compound" consequences are also possible - if, for example, a variant falls in the final few bases of an exon, it may be considered to affect a splicing site, in addition to possibly affecting the coding sequence.

Since we cannot possibly predict the exact biology of what will happen, what we provide is the most conservative estimate that covers all reasonable scenarios. It is up to you, the user, to interpret this information!


[Back to top]

Web VEP questions

Q: How do I access the web version of the Variant Effect Predictor?

A: You can find the web VEP on the Tools page, or from any species-specific page by clicking the blue "Manage your data" link under the left-hand menu, then "Variant Effect Predictor".


Q: I have selected a VCF file in the file upload field, but nothing happens when I click the blue "Next" button. Why?

A: Ensure that you have selected VCF as the input file format, and that your VCF file is formatted correctly.


Q: I uploaded a file with 1000 variants, but some of them seem to be missing from the output?

A: Due to a limitation in the servers underlying Ensembl, only 750 variants in one file can be processed at once. Consider splitting your file into smaller chunks, or using the standalone perl script.


Q: Why is the output I get for my input file different when I use the web VEP and the VEP script?

A: Ensure that you are passing equivalent arguments to the script that you are using in the web version. If you are sure this is still a problem, please report it on the Ensembl developers mailing list,


[Back to top]

VEP script questions

Q: Why do I see the following error?

Could not connect to database homo_sapiens_core_63_37 as user anonymous using [DBI:mysql:database=homo_sapiens_core_63_37;;port=5306] as a locator:
Unknown MySQL server host '' (2) at $HOME/src/ensembl/modules/Bio/EnsEMBL/DBSQL/ line 290.

-------------------- EXCEPTION --------------------
MSG: Could not connect to database homo_sapiens_core_63_37 as user anonymous using [DBI:mysql:database=homo_sapiens_core_63_37;;port=5306] as a locator:
Unknown MySQL server host '' (2)

A: By default the VEP script is configured to connect to the public MySQL server at Occasionally the server may break connection with your script, which causes this error. This can happen when the server is busy, or due to various network issues. Consider using a local copy of the database, or the new caching system.


Q: Can I download all of the SIFT and/or PolyPhen predictions?

A: The Ensembl Variation database and the human VEP cache file contain precalculated SIFT and PolyPhen predictions for every possible amino acid change in every translated protein product in Ensembl. Since these data are huge, we store them in a compressed format. The best approach to extract them is to use our Perl API.

The format in which the data are stored in our database is described here

The simplest way to access these matrices is to use an API script to fetch a ProteinFunctionPredictionMatrix for your protein of interest and then call its 'get_prediction' method to get the score for a particular position and amino acid, looping over all possible amino acids for your position. There is some detailed documentation on this class in the API documentation here.

You would need to work out which peptide position your codon maps to, but there are methods in the TranscriptVariationAllele class that should help you (probably translation_start and translation_end).


[Back to top]