Archive Ensembl HomeArchive Ensembl Home
About Ensembl Variation | Data Description | Predicted Data | Database Description | Perl API | Variant Effect Predictor

Ensembl Variation Tables Description

Introduction

This document gives a high-level description of the tables that make up the Ensembl variation schema. Tables are listed by alphabetical order, and the purpose of each table is explained. It is intended to allow people to familiarise themselves with the schema when encountering it for the first time, or when they need to use some tables that they've not used before.

This document refers to version 67 of the Ensembl variation schema.

The schema diagram is available in the following link: [PDF]
A colour legend is available at the bottom of the page.



List of the tables:



allele Show columns  | [Back to top]

This table stores information about each of a variation's alleles, along with population frequencies.

See also:


allele_code Show columns  | [Back to top]

This table stores the relationship between the internal allele identifiers and the alleles themselves.

See also:


associate_study Show columns  | [Back to top]

This table contains identifiers of associated studies (e.g. NHGRI and EGA studies with the same pubmed identifier).

See also:


attrib Show columns  | [Back to top]

Defines various attributes used elsewhere in the database

See also:


attrib_set Show columns  | [Back to top]

Groups related attributes together

See also:


attrib_type Show columns  | [Back to top]

Defines the set of possible attribute types used in the attrib table

See also:


compressed_genotype_region Show columns  | [Back to top]

This table holds genotypes compressed using the pack() method in Perl. These genotypes are mapped to particular genomic locations rather than variation objects. The data have been compressed to reduce table size and increase the speed of the web code when retrieving strain slices and LD data. Only data from resequenced and individuals used for LD calculations are included in this table

See also:


compressed_genotype_var Show columns  | [Back to top]

This table holds genotypes compressed using the pack() method in Perl. These genotypes are mapped directly to variation objects. The data have been compressed to reduce table size. All genotypes in the database are included in this table (included duplicates of those genotypes contained in the compressed_genotype_region table). This table is optimised for retrieval from

See also:


coord_system Show columns  | [Back to top]

Stores information about the available co-ordinate systems for the species identified through the species_id field. Note that for each species, there must be one co-ordinate system that has the attribute "top_level" and one that has the attribute "sequence_level".

See also:


failed_allele Show columns  | [Back to top]

Contains alleles that did not pass the Ensembl filters

See also:


failed_description Show columns  | [Back to top]

This table contains descriptions of reasons for a variation being flagged as failed.

See also:


failed_structural_variation Show columns  | [Back to top]

For various reasons it may be necessary to store information about a structural variation that has failed quality checks (mappings) in the Structural Variation pipeline. This table acts as a flag for such failures.

See also:


failed_variation Show columns  | [Back to top]

For various reasons it may be necessary to store information about a variation that has failed quality checks in the Variation pipeline. This table acts as a flag for such failures.

See also:


flanking_sequence Show columns  | [Back to top]

This table contains the upstream and downstream sequence surrounding a variation. Since each variation is defined by its flanking sequence, this table has a one-to-one relationship with the variation table.

See also:


genotype_code Show columns  | [Back to top]

This table stores genotype codes as multiple rows of allele_code identifiers, linked by genotype_code_id and ordered by haplotype_id.

See also:


individual Show columns  | [Back to top]

Stores information about an identifiable individual, including gender and the identifiers of the individual's parents (if known).

See also:


individual_genotype_multiple_bp Show columns  | [Back to top]

This table holds uncompressed genotypes for given variations.

See also:


individual_population Show columns  | [Back to top]

This table resolves the many-to-many relationship between the individual and population tables; i.e. samples may belong to more than one population. Hence it is composed of rows of individual and population identifiers.

See also:


individual_type Show columns  | [Back to top]

This table resolves the many-to-many relationship between the individual and population tables; i.e. samples may belong to more than one population. Hence it is composed of rows of individual and population identifiers.

See also:


meta Show columns  | [Back to top]

This table stores various metadata relating to the database, generally used by the Ensembl web code.


meta_coord Show columns  | [Back to top]

This table gives the coordinate system used by various tables in the database.


phenotype Show columns  | [Back to top]

This table stores details of the phenotypes associated with variation annotations.

See also:


population Show columns  | [Back to top]

A table consisting simply of sample_ids representing populations; all data relating to the populations are stored in separate tables (see below).
A population may be an ethnic group (e.g. caucasian, hispanic), assay group (e.g. 24 europeans), strain, phenotypic group (e.g. blue eyed, diabetes) etc. Populations may be composed of other populations by defining relationships in the population_structure table.

See also:


population_genotype Show columns  | [Back to top]

This table stores genotypes and frequencies for variations in given populations.

See also:


population_structure Show columns  | [Back to top]

This table stores hierarchical relationships between populations by relating them as populations and sub-populations.

See also:


protein_function_predictions Show columns  | [Back to top]

Contains encoded protein function predictions for every protein-coding transcript in this species

See also:


read_coverage Show columns  | [Back to top]

This table stores the read coverage in the resequencing of individuals. Each row contains an individual ID, chromosomal coordinates and a read coverage level.

See also:


sample Show columns  | [Back to top]

Sample is used as a generic catch-all term to cover individuals, populations and strains; it contains a name and description, as well as a size if applicable to the population.

See also:


sample_synonym Show columns  | [Back to top]

Used to store alternative names for populations when data comes from multiple sources.

See also:


seq_region Show columns  | [Back to top]

This table stores the relationship between Ensembl's internal coordinate system identifiers and traditional chromosome names.

See also:


source Show columns  | [Back to top]

This table contains details of the source from which a variation is derived. Most commonly this is NCBI's dbSNP; other sources include SNPs called by Ensembl.

See also:


structural_variation Show columns  | [Back to top]

This table stores information about structural variation.

See also:


structural_variation_annotation Show columns  | [Back to top]

This table stores phenotype and sample information for structural variants and their supporting evidences.

See also:


structural_variation_association Show columns  | [Back to top]

This table stores the associations between structural variations and their supporting evidences.

See also:


structural_variation_feature Show columns  | [Back to top]

This table stores information about structural variation features (i.e. mappings of structural variations to genomic locations).

See also:


study Show columns  | [Back to top]

This table contains details of the studies. The studies information can come from internal studies (DGVa, EGA) or from external studies (Uniprot, NHGRI, ...).

See also:


subsnp_handle Show columns  | [Back to top]

This table contains the SubSNP(ss) ID and the name of the submitter handle of dbSNP.

See also:


tagged_variation_feature Show columns  | [Back to top]

This table lists variation feature IDs that are tagged by another variation feature ID. Tag pairs are defined as having an r2 > 0.99.

See also:


tmp_individual_genotype_single_bp Show columns  | [Back to top]

his table is only needed for create master schema when run healthcheck system. Needed for other species, but human, so keep it.

See also:


transcript_variation Show columns  | [Back to top]

This table relates a single allele of a variation_feature to a transcript (see Core documentation). It contains the consequence of the allele e.g. intron_variant, non_synonymous_codon, stop_lost etc, along with the change in amino acid in the resulting protein if applicable.

See also:


translation_md5 Show columns  | [Back to top]

Maps a hex MD5 hash of a translation sequence to an ID used for the protein function predictions

See also:


variation Show columns  | [Back to top]

This is the schema's generic representation of a variation, defined as a genetic feature that varies between individuals of the same species.The most common type is the single nucleotide variation (SNP) though the schema also accommodates copy number variations (CNVs) and structural variations (SVs).A variation is defined by its flanking sequence rather than its mapped location on a chromosome; a variation may in fact have multiple mappings across a genome.This table stores a variation's name (commonly an ID of the form e.g. rs123456, assigned by dbSNP), along with a validation status and ancestral (or reference) allele.

See also:


variation_annotation Show columns  | [Back to top]

This table stores information linking genotypes and phenotypes. It stores various fields pertaining to the study conducted, along with the associated gene, risk allele frequency and a p-value.

See also:


variation_feature Show columns  | [Back to top]

This table represents mappings of variations to genomic locations. It stores an allele string representing the different possible alleles that are found at that locus e.g. "A/T" for a SNP, as well as a "worst case" consequence of the mutation. It also acts as part of the relationship between variations and transcripts.

See also:


variation_set Show columns  | [Back to top]

This table containts the name of sets and subsets of variations stored in the database. It usually represents the name of the project or subproject where a group of variations has been identified.

See also:


variation_set_structural_variation Show columns  | [Back to top]

A table for mapping structural variations to variation_sets.

See also:


variation_set_structure Show columns  | [Back to top]

This table stores hierarchical relationships between variation sets by relating them as variation sets and variation subsets.

See also:


variation_set_variation Show columns  | [Back to top]

A table for mapping variations to variation_sets.

See also:


variation_synonym Show columns  | [Back to top]

This table allows for a variation to have multiple IDs, generally given by multiple sources.

See also:



Colour legend

Other tables
Tables containing individual data
Tables containing structural variation data
Tables containing sets of variations
Tables containing source and study data
Tables containing metadata
Tables containing "failed" data
Tables containing attribute data
Tables concerning protein data