Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor Class Reference

Class Summary
Synopsis
Connecting to the database using the Registry use Bio::EnsEMBL::Registry; my $reg = "Bio::EnsEMBL::Registry"; $reg-\>load_registry_from_db(-host=\>"ensembldb.ensembl.org", -user=\>"anonymous"); my $conservation_score_adaptor = $reg-\>get_adaptor( "Multi", "compara", "ConservationScore"); Store data in the database $conservation_score_adaptor-\>store($conservation_score); To retrieve score data from the database using the default display_size $conservation_scores = $conservation_score_adaptor-\>fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice); To retrieve one score per base in the slice $conservation_scores = $conservation_score_adaptor-\>fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice, $slice-\>end-$slice-\>start+1); Print the scores foreach my $score (@$conservation_scores) { printf("position %d observed %.4f expected %.4f difference %.4f\n", $score-\>position, $score-\>observed_score, $score-\>expected_score, $score-\>diff_score); } A simple example script for extracting scores from a slice can be found in ensembl-compara/scripts/examples/getConservationScores.pl
Description
This module is used to access data in the conservation_score table. The scores are stored in the database as LITTLE ENDIAN. Each score is represented by a Bio::EnsEMBL::Compara::ConservationScore. The position and an observed, expected score and a difference score (expected-observed) is stored for each column in a multiple alignment. Not all columns in an alignment have a score (for example, if there is insufficient coverage) and termed here as 'uncalled'. In order to speed up processing of the scores over large regions, the scores are stored in the database averaged over window_sizes of 1 (no averaging), 10, 100 and 500. When retrieving the scores, the most appropriate window_size is estimated from the length of the alignment or slice and the number of scores requested, given by the display_size. There is no need to specify the window_size directly. If the number of scores requested (display_size) is smaller than the alignment length or slice length, the scores will be either averaged if display_type = "AVERAGE" or the maximum value taken if display_type = "MAX". Scores in uncalled regions are not returned. To return a score for each column in an alignment, the display_size should be set to be the same size as the alignment length or slice length.
Definition at line 48 of file ConservationScoreAdaptor.pm.
Method Documentation
protected Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_add_to_bucket | ( | ) |
Arg 1 : string $display_type (either AVERAGE or MAX (plot average or max value)) Arg 2 : float $exp_score (expected score to be added to bucket) Arg 3 : float $diff_score (difference score to be added to bucket) Arg 4 : int $chr_pos (position in slice of reference species) Arg 5 : int $start_slice (start position of slice) Arg 6 : int $num_buckets (number of buckets used so far) Arg 7 : int $genomic_align_block_id (genomic_align_block_id of alignment block) Arg 8 : int $win_size window size used Example : $aligned_score = _add_to_bucket($self, "AVERAGE", $exp_score, $diff_score, $chr_pos, $start_slice, scalar(@$aligned_scores), $genomic_align_block_id, $win_size); Description: Add scores to bucket until it is full (given by size) and then average the called scores or take the maximum (given by display_type). Once the bucket is full, create a new conservation score object Returntype : ref to Bio::EnsEMBL::Compara::ConservationScore object if the bucket if full or 0 if it isn't full yet Exceptions : none Caller : general Status : At risk

click to view
protected Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_fetch_all_by_GenomicAlignBlockId_WindowSize | ( | ) |
Arg 1 : integer $genomic_align_block_id Arg 2 : integer $window_size Arg 3 : (opt) boolean $packed (default 0) Example : my $conservation_scores = $conservation_score_adaptor->_fetch_all_by_GenomicAlignBlockId(23134); Description: Retrieve the corresponding Bio::EnsEMBL::Compara::ConservationScore objects. Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore objects. If $packed is true, return the scores in a packed format given by $_pack_size and $_pack_type. Exceptions : none Caller : general

click to view
protected Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_find_min_max_score | ( | ) |
Arg 1 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Example : my ($min, $max) = _find_min_max_score($scores); Description: find the min and max scores used for y axis scaling Returntype : (float, float) Exceptions : Caller : general Status : At risk

click to view
protected Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_find_score_index | ( | ) |
Arg 1 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 2 : int $num_scores (number of scores in the array) Arg 3 : int $score_lengths number of scores in each row of the array Arg 4 : int $pos (position to find) Arg 5 : int $win_size (window size) Example : $exp_scores = _unpack_scores($scores); Description: find the score index (row) that contains $pos in alignment coords Returntype : int Exceptions : none Caller : general Status : At risk

click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_aligned_scores_from_cigar_line | ( | ) |
Arg 1 : string $cigar_line (cigar string from current alignment block) Arg 2 : int $start_region (start of genomic_align_block (chr coords)) Arg 3 : int $end_region (end of genomic_align_block (chr coords)) Arg 4 : int $start_slice (start of slice (chr coords) Arg 5 : int $end_slice (end of slice (chr coords)) Arg 6 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 7 : int $genomic_align_block_id (genomic align block id of current alignment block) Arg 8 : int $genomic_align_block_length (length of current alignment block) Arg 9 : string $display_type (either AVERAGE or MAX (plot average or max value)) Arg 10 : int $win_size (window size used) Arg 11 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in slice coords
Example : $scores = $self->_get_aligned_scores_from_cigar_line($genomic_align->cigar_line, $genomic_align->dnafrag_start, $genomic_align->dnafrag_end, $slice->start, $slice->end, $conservation_scores, $genomic_align_block->dbID, $genomic_align_block->length, $display_type, $window_size, $scores); Description: Convert conservation scores from alignment coordinates into species specific chromosome (slice) coordinates for an alignment genomic_align_block Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Exceptions : none Caller : general Status : At risk

click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_aligned_scores_from_cigar_line_fast | ( | ) |
Arg 1 : string $cigar_line (cigar string from current alignment block) Arg 2 : int $start_region (start of genomic_align_block (chr coords)) Arg 3 : int $end_region (end of genomic_align_block (chr coords)) Arg 4 : int $start_slice (start of slice (chr coords) Arg 5 : int $end_slice (end of slice (chr coords)) Arg 6 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 7 : int $genomic_align_block_id (genomic align block id of current alignment block) Arg 8 : int $genomic_align_block_length (length of current alignment block) Arg 9 : string $display_type (either AVERAGE or MAX (plot average or max value)) Arg 10 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in slice coords
Example : $scores = $self->_get_aligned_scores_from_cigar_line_fast($genomic_align->cigar_line, $genomic_align->dnafrag_start, $genomic_align->dnafrag_end, $slice->start, $slice->end, $conservation_scores, $genomic_align_block->dbID, $genomic_align_block->length, $display_type, $scores); Description: Faster method to than _get_aligned_scores_from_cigar_line. This method does not bin the scores and can be used if only require one score per base in the alignment Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Exceptions : none Caller : general Status : At risk

click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_alignment_scores | ( | ) |
Arg 1 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 2 : int $align_start (start position in alignment coords) Arg 3 : int $align_end (end position in alignment coords) Arg 4 : string $display_type (either AVERAGE or MAX (plot average or max value)) Arg 5 : int $win_size (window size used) Arg 6 : ref to Bio::EnsEMBL::Compara::GenomicAlignBlock object Example : $scores = $self->_get_alignment_scores($conservation_scores, 1, 100000, "AVERAGE", 10, $genomic_align_block); Description: get scores for an alignment in alignment coordinates Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in alignment coordinates Exceptions : none Caller : general Status : At risk

click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_all_ref_genomic_aligns | ( | ) |
Arg 1 : ref to Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object Arg 2 : ref to Bio::EnsEMBL::Slice object Example : my $light_genomic_aligns = $self->_get_all_ref_genomic_aligns($ma_mlss, $slice); Description: Retrieve from the database some genomic_align information relating to only the slice species. Returntype : listref of hash containing a subset of genomic_align fields Exceptions : none Caller : general Status : At risk

click to view
protected void Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_print_scores | ( | ) |
Arg 1 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 2 : boolean $packed (0 if not packed, 1 if packed) Example : $conservation_scores = _reverse($conservation_scores); Description: print scores (unpack first if necessary) Returntype : none Exceptions : none Caller : general Status : At risk

click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_reverse | ( | ) |
Arg 1 : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores Arg 2 : int $genomic_align_block_length (number of scores) Example : $conservation_scores = _reverse($conservation_scores); Description: reverse the conservation scores for complemented sequences Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects Exceptions : Caller : general Status : At risk

click to view
protected Space Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_unpack_score | ( | ) |
Arg 1 : string $score Example : $exp_scores = _unpack_score($score); Description: unpack score values retrieved from a database Returntype : space delimited string of floats Exceptions : none Caller : general Status : At risk

click to view
protected Space Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_unpack_scores | ( | ) |
Arg 1 : string $scores Example : $exp_scores = _unpack_scores($scores); Description: unpack score values retrieved from a database Returntype : space delimited string of floats Exceptions : none Caller : general Status : At risk

click to view
public Int Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::delete_by_genomic_align_block_id | ( | ) |
Arg 1 : int $genomic_align_block_id Example : $conservation_score_adaptor->delete_by_genomic_align_block_id(123); Description: Delete all the scores related to this GenomicAlignBlock object Returntype : int (number of deleted rows, not scores) Exceptions : throw if not $genomic_align_block_id Status : Stable

click to view
public Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::fetch_all_by_GenomicAlignBlock | ( | ) |
Arg 1 : Bio::EnsEMBL::Compara::GenomicAlignBlock $genomic_align_block Arg 2 : (opt) integer $align_start (default 1) Arg 3 : (opt) integer $align_end (default $genomic_align_block->length) Arg 4 : (opt) integer $slice_length (default $genomic_align_block->length) Arg 5 : (opt) integer $display_size (default 700) Arg 6 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "AVERAGE") Arg 7 : (opt) integer $window_size Example : my $conservation_scores = $conservation_score_adaptor->fetch_all_by_GenomicAlignBlock($genomic_align_block, $align_start, $align_end, $slice_length, $slice_length); Description: Retrieve the corresponding Bio::EnsEMBL::Compara::ConservationScore objects. Each conservation score object contains a position in alignment coordinates, the observed_score, the expected_score and the diff_score (conservation score) calculated as (expected_score - observed_score). The $align_start and $align_end parameters give the start and end of a region within a genomic_align_block and should be in alignment coordinates. The $slice_length is the total length of the region to be displayed and may span several individual genomic align blocks. It is used to automatically calculate the window_size. Display_size is the number of scores that will be returned. To return a score for each column in an alignment the display_size should be set to be the same size as the alignment length. If the $slice_length is larger than the $display_size, the scores will either be averaged if the display_type is "AVERAGE" or the maximum taken if display_type is "MAXIMUM". Window_size defines which set of pre-averaged scores to use. Valid values are 1, 10, 100 or 500. There is no need to define the window_size because the program will select the most appropriate window_size to use based on the slice_length and the display_size. Alignment positions which have no scores are not returned. The min and max y axis values for the array of conservation score objects are set in the first conservation score object (index 0). Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore objects. Caller : object::methodname Status : At risk

click to view
public Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::fetch_all_by_MethodLinkSpeciesSet_Slice | ( | ) |
Arg 1 : Bio::EnsEMBL::Compara::MethodLinkSpeciesSet $method_link_species_set Arg 2 : Bio::EnsEMBL::Slice $slice Arg 3 : (opt) integer $display_size (default 700) Arg 4 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "AVERAGE") Arg 5 : (opt) integer $window_size Exceptions : warning if window_size is not valid Example : my $conservation_scores = $conservation_score_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice, $slice->end-$slice->start+1); Description: Retrieve the corresponding Bio::EnsEMBL::Compara::ConservationScore objects. Each conservation score object contains a position in slice coordinates, the observed_score, the expected_score and the diff_score (or conservation score) calculated as the (expected_score - observed_score). The method_link_species_set is obtained using the method_link type of "GERP_CONSERVATION_SCORE". For example, this could be obtained for the 10 way PECAN alignment, using: my $mlss = $mlss_adaptor->fetch_by_method_link_type_registry_aliases("GERP_CONSERVATION_SCORE", ["human", "chimp", "rhesus", "cow", "dog", "mouse", "rat", "opossum", "platypus", "chicken"]);
Display_size defines the number of scores that will be returned. To return a score for each column in an alignment the display_size should be set to be the same size as the slice length eg ($slice->end-$slice->start+1). If the slice length is larger than the display_size, the scores will either be averaged if the display_type is "AVERAGE" or the maximum taken if display_type is "MAXIMUM". Window_size defines which set of pre-averaged scores to use. Valid values are 1, 10, 100 or 500 although there is no need to define the window_size because the program will select the most appropriate window_size to use based on the slice length and the display_size for example, a slice length of 1000000 and display_size of 1000 will automatically use a window_size of 500. Slice positions which have no scores are not returned. The min and max y axis values for the array of conservation score objects are set in the first conservation score object (index 0).
Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore objects. Caller : object::methodname Status : At risk

click to view
public void Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::store | ( | ) |
Arg [1] : Bio::EnsEMBL::Compara::ConservationScore $cs Example : $csa->store($cs); Description: Stores a conservation score object in the compara database if it has not been stored already. Returntype : none Exceptions : thrown if $genomic_align_block is not a Bio::EnsEMBL::Compara::GenomicAlignBlock object Exceptions : thrown if the argument is not a Bio::EnsEMBL::Compara:ConservationScore Caller : general Status : At risk

click to view
The documentation for this class was generated from the following file:
- Bio/EnsEMBL/Compara/DBSQL/ConservationScoreAdaptor.pm