Regulatory Segmentation
The ENCODE combined segmentation classifies the genome into regions of similar signal over 14 assays to obtain, for six ENCODE cell types, a single-track summary of the functional architecture of the human genome.
The assays were generated in the ENCODE project for GM12878, K562, H1-hESC, HepG2, HeLa-S3, and HUVEC, and were chosen to maximise distinct information about the state of the genome. These assays (including control input sequencing) were coordinated across all cell lines and constituted three classes of data:
Input Data Class | Description |
---|---|
Open chromatin | DNase1 hypersensitivity and FAIRE |
Transcription factors | PolII and CTCF |
Histone modifications | H3k4me1, H3k4me2, H3k4me3, H3k9ac, H3k27ac, H3k27me3, H3k36me3, H4k20me1 |
Two unsupervised segmentation programs were used:
- ChromHMM (Ernst et al. 2011 PMID 21441907)
- Segway (Hoffman et al.)
In brief, ChromHMM labels each assay as high or low in 200 base pair bins over the whole human genome and runs a 25-state Hidden Markov Model. |
A Dynamic Bayesian Network approach using base-pair resolution real valued signal data, trained over the ENCODE pilot regions (1% of the genome), and fitted over the whole genome. |
The segmentations produced by these two methods were then combined, based on their agreements (Hoffman, Ernst et al., In Preparation) in an automated fashion, in order to maximise resolution and biological interpretability. The segments were then labelled according to their signal distribution and genomic location, giving the following classifications:
Segment Class | Abbreviation | Description |
---|---|---|
CTCF enriched | CTCF | CTCF enriched element |
Predicted Weak Enhancer/Cis-reg element | WE | Predicted weak enhancer or open chromatin cis regulatory element |
Predicted Transcribed Region | T | Predicted transcribed region |
Predicted Enhancer | E | Predicted enhancer |
Predicted Promoter Flank | PF | Predicted promoter flanking region |
Predicted Repressed/Low Activity | R | Predicted repressed or low activity region |
Predicted Promoter with TSS | TSS | Predicted promoter region including transcription start site |
The following graphic shows clustering of informative features used to generate the different segment classifications, the x-axis refers to the segment class and the y-axis shows different groups of experiments for a given feature type e.g DNase1, H3K4me2 etc.
![]() |
The 'Signal track' legend shows the signal distribution a particular feature type within each segment class (1 being the highest signal). A summary of this clustering shows the following associations:
|
Regulatory Segmentation in the Browser
There a six segmentation tracks available, reflecting the cell lines used in the segmentation analysis. These are tracks are on by default in most regulation views, and are also configurable in 'Location' view, by accessing the 'Regulation' section of the configuration panel. Colours used for each of the segmentation classes follows the agreed ENCODE standard (see legend).