Archive Ensembl HomeArchive Ensembl Home
Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor Class Reference
Inheritance diagram for Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor:

List of all members.


Class Summary

Synopsis

  Connecting to the database using the Registry

     use Bio::EnsEMBL::Registry;
 
     my $reg = "Bio::EnsEMBL::Registry";

      $reg-\>load_registry_from_db(-host=\>"ensembldb.ensembl.org", -user=\>"anonymous");

      my $conservation_score_adaptor = $reg-\>get_adaptor(
         "Multi", "compara", "ConservationScore");

  Store data in the database

     $conservation_score_adaptor-\>store($conservation_score);

  To retrieve score data from the database using the default display_size
     $conservation_scores = $conservation_score_adaptor-\>fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice);

  To retrieve one score per base in the slice
     $conservation_scores = $conservation_score_adaptor-\>fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice, $slice-\>end-$slice-\>start+1);
  Print the scores
   foreach my $score (@$conservation_scores) {
      printf("position %d observed %.4f expected %.4f difference %.4f\n",  $score-\>position, $score-\>observed_score, $score-\>expected_score, $score-\>diff_score);
   }

  A simple example script for extracting scores from a slice can be found in ensembl-compara/scripts/examples/getConservationScores.pl

Description

This module is used to access data in the conservation_score table. The scores are stored in the database as LITTLE ENDIAN.
Each score is represented by a Bio::EnsEMBL::Compara::ConservationScore. The position and an observed, expected score and a difference score (expected-observed) is stored for each column in a multiple alignment. Not all columns in an alignment have a score (for example, if there is insufficient coverage) and termed here as 'uncalled'. 
In order to speed up processing of the scores over large regions, the scores are stored in the database averaged over window_sizes of 1 (no averaging), 10, 100 and 500. When retrieving the scores, the most appropriate window_size is estimated from the length of the alignment or slice and the number of scores requested, given by the display_size. There is no need to specify the window_size directly. If the number of scores requested (display_size) is smaller than the alignment length or slice length, the scores will be either averaged if display_type = "AVERAGE" or the maximum value taken if display_type = "MAX". Scores in uncalled regions are not returned. To return a score for each column in an alignment, the display_size should be set to be the same size as the alignment length or slice length.
 

Definition at line 48 of file ConservationScoreAdaptor.pm.

Available Methods

protected Ref _add_to_bucket ()
protected _columns ()
protected _default_where_clause ()
protected Ref _fetch_all_by_GenomicAlignBlockId_WindowSize ()
protected _final_clause ()
protected _find_min_max_score ()
protected _find_score_index ()
protected Listref _get_aligned_scores_from_cigar_line ()
protected Listref _get_aligned_scores_from_cigar_line_fast ()
protected Listref _get_alignment_scores ()
protected Listref _get_all_ref_genomic_aligns ()
protected _left_join ()
protected _list_dbIDs ()
protected _objs_from_sth ()
protected void _print_scores ()
protected Listref _reverse ()
protected _straight_join ()
protected _tables ()
protected Space _unpack_score ()
protected Space _unpack_scores ()
public Listref bind_param_generic_fetch ()
public
Bio::EnsEMBL::DBSQL::DBAdaptor 
db ()
public
Bio::EnsEMBL::DBSQL::DBConnection 
dbc ()
public Int delete_by_genomic_align_block_id ()
public dump_data ()
public fetch_all ()
public Listref fetch_all_by_dbID_list ()
public Ref fetch_all_by_GenomicAlignBlock ()
public Ref fetch_all_by_MethodLinkSpeciesSet_Slice ()
public Bio::EnsEMBL::Feature fetch_by_dbID ()
public Listref generic_fetch ()
public get_dumped_data ()
public Boolean is_multispecies ()
public Scalar last_insert_id ()
public
Bio::EnsEMBL::DBSQL::BaseAdaptor 
new ()
public DBI::StatementHandle prepare ()
public Int species_id ()
public void store ()

Method Documentation

protected Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_add_to_bucket ( )
  Arg  1     : string $display_type (either AVERAGE or MAX (plot average or max value))
  Arg  2     : float $exp_score (expected score to be added to bucket)
  Arg  3     : float $diff_score (difference score to be added to bucket)
  Arg  4     : int $chr_pos (position in slice of reference species)
  Arg  5     : int $start_slice (start position of slice)
  Arg  6     : int $num_buckets (number of buckets used so far)
  Arg  7     : int $genomic_align_block_id (genomic_align_block_id of 
               alignment block)
  Arg  8     : int $win_size window size used
  Example    : $aligned_score = _add_to_bucket($self, "AVERAGE", $exp_score, $diff_score, $chr_pos, $start_slice, scalar(@$aligned_scores), $genomic_align_block_id, $win_size);
  Description: Add scores to bucket until it is full (given by size) and then 
               average the called scores or take the maximum (given by 
               display_type). Once the bucket is full, create a new 
               conservation score object
  Returntype : ref to Bio::EnsEMBL::Compara::ConservationScore object if the
               bucket if full or 0 if it isn't full yet
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_fetch_all_by_GenomicAlignBlockId_WindowSize ( )
  Arg  1     : integer $genomic_align_block_id 
  Arg  2     : integer $window_size
  Arg  3     : (opt) boolean $packed (default 0)
  Example    : my $conservation_scores =
                    $conservation_score_adaptor->_fetch_all_by_GenomicAlignBlockId(23134);
  Description: Retrieve the corresponding
               Bio::EnsEMBL::Compara::ConservationScore objects. 
  Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore objects. If $packed is true, return the scores in a packed format given by $_pack_size and $_pack_type.
  Exceptions : none
  Caller     : general
 
Code:
click to view
protected Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_find_min_max_score ( )
  Arg  1     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Example    : my ($min, $max) =  _find_min_max_score($scores);
  Description: find the min and max scores used for y axis scaling
  Returntype : (float, float)
  Exceptions :
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_find_score_index ( )
  Arg  1     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  2     : int $num_scores (number of scores in the array)
  Arg  3     : int $score_lengths number of scores in each row of the array
  Arg  4     : int $pos (position to find)
  Arg  5     : int $win_size (window size)
  Example    : $exp_scores = _unpack_scores($scores);
  Description: find the score index (row) that contains $pos in alignment coords  Returntype : int 
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_aligned_scores_from_cigar_line ( )
  Arg  1     : string $cigar_line (cigar string from current alignment block)
  Arg  2     : int $start_region (start of genomic_align_block (chr coords))
  Arg  3     : int $end_region (end of genomic_align_block (chr coords))
  Arg  4     : int $start_slice (start of slice (chr coords)
  Arg  5     : int $end_slice (end of slice (chr coords))
  Arg  6     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  7     : int $genomic_align_block_id (genomic align block id of current alignment block)
  Arg  8     : int $genomic_align_block_length (length of current alignment block)
  Arg  9     : string $display_type (either AVERAGE or MAX (plot average or max value))
  Arg 10     : int $win_size (window size used)
  Arg 11     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in slice coords
  Example    : $scores = $self->_get_aligned_scores_from_cigar_line($genomic_align->cigar_line, $genomic_align->dnafrag_start, $genomic_align->dnafrag_end, $slice->start, $slice->end, $conservation_scores, $genomic_align_block->dbID, $genomic_align_block->length, $display_type, $window_size, $scores);
  Description: Convert conservation scores from alignment coordinates into species specific chromosome (slice) coordinates for an alignment genomic_align_block
  Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_aligned_scores_from_cigar_line_fast ( )
  Arg  1     : string $cigar_line (cigar string from current alignment block)
  Arg  2     : int $start_region (start of genomic_align_block (chr coords))
  Arg  3     : int $end_region (end of genomic_align_block (chr coords))
  Arg  4     : int $start_slice (start of slice (chr coords)
  Arg  5     : int $end_slice (end of slice (chr coords))
  Arg  6     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  7     : int $genomic_align_block_id (genomic align block id of current alignment block)
  Arg  8     : int $genomic_align_block_length (length of current alignment block)
  Arg  9     : string $display_type (either AVERAGE or MAX (plot average or max value))
  Arg 10     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in slice coords
  Example    : $scores = $self->_get_aligned_scores_from_cigar_line_fast($genomic_align->cigar_line, $genomic_align->dnafrag_start, $genomic_align->dnafrag_end, $slice->start, $slice->end, $conservation_scores, $genomic_align_block->dbID, $genomic_align_block->length, $display_type, $scores);
  Description: Faster method to than _get_aligned_scores_from_cigar_line. This
               method does not bin the scores and can be used if only require
               one score per base in the alignment
  Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_alignment_scores ( )
  Arg  1     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  2     : int $align_start (start position in alignment coords)
  Arg  3     : int $align_end (end position in alignment coords)
  Arg  4     : string $display_type (either AVERAGE or MAX (plot average or max value))
  Arg  5     : int $win_size (window size used)
  Arg  6     : ref to Bio::EnsEMBL::Compara::GenomicAlignBlock object
  Example    : $scores = $self->_get_alignment_scores($conservation_scores, 
               1, 100000, "AVERAGE", 10, $genomic_align_block);
  Description: get scores for an alignment in alignment coordinates
  Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores in alignment coordinates
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_get_all_ref_genomic_aligns ( )
  Arg  1     : ref to Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object
  Arg  2     : ref to Bio::EnsEMBL::Slice object
  Example    :  my $light_genomic_aligns = $self->_get_all_ref_genomic_aligns($ma_mlss, $slice);
  Description: Retrieve from the database some genomic_align information 
               relating to only the slice species. 
  Returntype : listref of hash containing a subset of genomic_align fields
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected void Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_print_scores ( )
  Arg  1     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  2     : boolean $packed (0 if not packed, 1 if packed)
  Example    : $conservation_scores = _reverse($conservation_scores);
  Description: print scores (unpack first if necessary)
  Returntype : none
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Listref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_reverse ( )
  Arg  1     : listref of Bio::EnsEMBL::Compara::ConservationScore objects $scores
  Arg  2     : int $genomic_align_block_length (number of scores)
  Example    : $conservation_scores = _reverse($conservation_scores);
  Description: reverse the conservation scores for complemented sequences
  Returntype : listref of Bio::EnsEMBL::Compara::ConservationScore objects
  Exceptions : 
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Space Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_unpack_score ( )
  Arg  1     : string $score
  Example    : $exp_scores = _unpack_score($score);
  Description: unpack score values retrieved from a database
  Returntype : space delimited string of floats
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
protected Space Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::_unpack_scores ( )
  Arg  1     : string $scores
  Example    : $exp_scores = _unpack_scores($scores);
  Description: unpack score values retrieved from a database
  Returntype : space delimited string of floats
  Exceptions : none
  Caller     : general
  Status     : At risk
 
Code:
click to view
public Int Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::delete_by_genomic_align_block_id ( )
  Arg  1     : int $genomic_align_block_id
  Example    : $conservation_score_adaptor->delete_by_genomic_align_block_id(123);
  Description: Delete all the scores related to this GenomicAlignBlock object
  Returntype : int (number of deleted rows, not scores)
  Exceptions : throw if not $genomic_align_block_id
  Status     : Stable
 
Code:
click to view
public Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::fetch_all_by_GenomicAlignBlock ( )
  Arg  1     : Bio::EnsEMBL::Compara::GenomicAlignBlock $genomic_align_block
  Arg  2     : (opt) integer $align_start (default 1) 
  Arg  3     : (opt) integer $align_end (default $genomic_align_block->length)
  Arg  4     : (opt) integer $slice_length (default $genomic_align_block->length)
  Arg  5     : (opt) integer $display_size (default 700)
  Arg  6     : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "AVERAGE")
  Arg  7     : (opt) integer $window_size
  Example    : my $conservation_scores =
                    $conservation_score_adaptor->fetch_all_by_GenomicAlignBlock($genomic_align_block, $align_start, $align_end, $slice_length, $slice_length);
  Description: Retrieve the corresponding
               Bio::EnsEMBL::Compara::ConservationScore objects. 
	       Each conservation score object contains a position in alignment
               coordinates, the observed_score, the expected_score and the 
               diff_score (conservation score) calculated as 
	       (expected_score - observed_score).
               The $align_start and $align_end parameters give the start and 
               end of a region within a genomic_align_block and should be in 
               alignment coordinates.
               The $slice_length is the total length of the region to be 
               displayed and may span several individual genomic align blocks.
               It is used to automatically calculate the window_size.
               Display_size is the number of scores that will be returned.
               To return a score for each column in an alignment the display_size 
               should be set to be the same size as the alignment length. If 
               the $slice_length is larger than the $display_size, the scores 
               will either be averaged if the display_type is "AVERAGE" or the 
               maximum taken if display_type is "MAXIMUM". 
	       Window_size defines which set of pre-averaged scores to use. 
	       Valid values are 1, 10, 100 or 500. There is no need to define 
               the window_size because the program will select the most 
               appropriate window_size to use based on the slice_length and the
               display_size. 
               Alignment positions which have no scores are not returned.
               The min and max y axis values for 
               the array of conservation score objects are set in the first 
               conservation score object (index 0). 
  Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 
               objects. 
  Caller     : object::methodname
  Status     : At risk
 
Code:
click to view
public Ref Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::fetch_all_by_MethodLinkSpeciesSet_Slice ( )
  Arg  1     : Bio::EnsEMBL::Compara::MethodLinkSpeciesSet $method_link_species_set 
  Arg  2     : Bio::EnsEMBL::Slice $slice
  Arg  3     : (opt) integer $display_size (default 700)
  Arg  4     : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "AVERAGE")
  Arg  5     : (opt) integer $window_size
  Exceptions : warning if window_size is not valid
  Example    : my $conservation_scores =
                    $conservation_score_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($method_link_species_set, $slice, $slice->end-$slice->start+1);
  Description: Retrieve the corresponding 
               Bio::EnsEMBL::Compara::ConservationScore objects. 
               Each conservation score object contains a position in slice 
               coordinates, the observed_score, the expected_score and the 
               diff_score (or conservation score) calculated as the 
               (expected_score - observed_score).
               The method_link_species_set is obtained
               using the method_link type of "GERP_CONSERVATION_SCORE". 
               For example, this could be obtained for the 10 way PECAN 
               alignment, using:
               my $mlss = $mlss_adaptor->fetch_by_method_link_type_registry_aliases("GERP_CONSERVATION_SCORE", ["human", "chimp", "rhesus", "cow", "dog", "mouse", "rat", "opossum", "platypus", "chicken"]);
               Display_size defines the number of scores that will be returned.
               To return a score for each column in an alignment the display_size 
               should be set to be the same size as the slice length eg ($slice->end-$slice->start+1).
               If the slice length is larger than the display_size, the scores 
               will either be averaged if the display_type is "AVERAGE" or the 
               maximum taken if display_type is "MAXIMUM". 
               Window_size defines which set of pre-averaged scores to use. 
               Valid values are 1, 10, 100 or 500 although there is no need to 
               define the window_size because the program will select the most 
               appropriate window_size to use based on the slice length and the
               display_size for example, a slice length of 1000000 and 
               display_size of 1000 will automatically use a window_size of 500.
               Slice positions which have no scores are not returned.
               The min and max y axis values for the array of 
               conservation score objects are set in the first conservation 
               score object (index 0).
  Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore objects. 
  Caller     : object::methodname
  Status     : At risk
 
Code:
click to view
public void Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor::store ( )
  Arg [1]    : Bio::EnsEMBL::Compara::ConservationScore $cs
  Example    : $csa->store($cs);
  Description: Stores a conservation score object in the compara database if
               it has not been stored already.  
  Returntype : none
  Exceptions : thrown if $genomic_align_block is not a 
               Bio::EnsEMBL::Compara::GenomicAlignBlock object
  Exceptions : thrown if the argument is not a Bio::EnsEMBL::Compara:ConservationScore
  Caller     : general
  Status     : At risk
 
Code:
click to view

The documentation for this class was generated from the following file: