Archive Ensembl HomeArchive Ensembl Home
Regulatory Segmentation Regulatory Build | Regulation Sources | Regulatory Segmentation | Microarray Probeset Mapping | Database/API Summary | Schema Description | Schema Diagram | API Tutorial

Regulatory Segmentation

The ENCODE combined segmentation classifies the genome into regions of similar signal over 14 assays to obtain, for six ENCODE cell types, a single-track summary of the functional architecture of the human genome.

The assays were generated in the ENCODE project for GM12878, K562, H1-hESC, HepG2, HeLa-S3, and HUVEC, and were chosen to maximise distinct information about the state of the genome. These assays (including control input sequencing) were coordinated across all cell lines and constituted three classes of data:

Input Data ClassDescription
Open chromatinDNase1 hypersensitivity and FAIRE
Transcription factorsPolII and CTCF
Histone modificationsH3k4me1, H3k4me2, H3k4me3, H3k9ac, H3k27ac, H3k27me3, H3k36me3, H4k20me1

Two unsupervised segmentation programs were used:

  • ChromHMM (Ernst et al. 2011 PMID 21441907)
  • In brief, ChromHMM labels each assay as high or low in 200 base pair bins over the whole human genome and runs a 25-state Hidden Markov Model.
  • Segway (Hoffman et al.)
  • A Dynamic Bayesian Network approach using base-pair resolution real valued signal data, trained over the ENCODE pilot regions (1% of the genome), and fitted over the whole genome.

The segmentations produced by these two methods were then combined, based on their agreements (Hoffman, Ernst et al., In Preparation) in an automated fashion, in order to maximise resolution and biological interpretability. The segments were then labelled according to their signal distribution and genomic location, giving the following classifications:

Segment ClassAbbreviationDescription
CTCF enriched CTCFCTCF enriched element
Predicted Weak Enhancer/Cis-reg element WEPredicted weak enhancer or open chromatin cis regulatory element
Predicted Transcribed RegionTPredicted transcribed region
Predicted Enhancer EPredicted enhancer
Predicted Promoter Flank PFPredicted promoter flanking region
Predicted Repressed/Low Activity RPredicted repressed or low activity region
Predicted Promoter with TSS TSSPredicted promoter region including transcription start site

The following graphic shows clustering of informative features used to generate the different segment classifications, the x-axis refers to the segment class and the y-axis shows different groups of experiments for a given feature type e.g DNase1, H3K4me2 etc.

GM12878 Signal Distribution The 'Signal track' legend shows the signal distribution a particular feature type within each segment class (1 being the highest signal).

A summary of this clustering shows the following associations:
  • Transcriptional Activation: H3k4me1, H3K4me2, H3k4me3, H3k9ac, H3k27ac
  • Transcriptional Elongation: H3k36me3
  • Transcriptional Repression: H3k27me3, H4k20me1
The transcriptional repression signals are not obviously reflected in the heatmap, with H4K20me1 being apparently absent and the H3K27me3 being less intense than the CTCF signal. This is due to the fact that the 'R' state represents several so called dead states and fewer states for active repression i.e. the repressed state signal is being diluted.

Regulatory Segmentation in the Browser

There a six segmentation tracks available, reflecting the cell lines used in the segmentation analysis. These are tracks are on by default in most regulation views, and are also configurable in 'Location' view, by accessing the 'Regulation' section of the configuration panel. Colours used for each of the segmentation classes follows the agreed ENCODE standard (see legend).