Archive Ensembl HomeArchive Ensembl Home
AlignSlice.pm
Go to the documentation of this file.
00001 =head1 LICENSE
00002 
00003   Copyright (c) 1999-2012 The European Bioinformatics Institute and
00004   Genome Research Limited.  All rights reserved.
00005 
00006   This software is distributed under a modified Apache license.
00007   For license details, please see
00008 
00009     http://www.ensembl.org/info/about/code_licence.html
00010 
00011 =head1 CONTACT
00012 
00013   Please email comments or questions to the public Ensembl
00014   developers list at <dev@ensembl.org>.
00015 
00016   Questions may also be sent to the Ensembl help desk at
00017   <helpdesk@ensembl.org>.
00018 
00019 =head1 NAME
00020 
00021 Bio::EnsEMBL::Compara::AlignSlice - An AlignSlice can be used to map genes and features from one species onto another one
00022 
00023 =head1 DESCRIPTION
00024 
00025 INTRODUCTION
00026 
00027 An AlignSlice is an object built with a reference Slice and the corresponding set of genomic alignments.
00028 The genomic alignments are used to map features from one species onto another and viceversa.
00029 
00030 STRUCTURE
00031 
00032 Every Bio::EnsEMBL::Compara::AlignSlice contains a set of Bio::EnsEMBL::Compara::AlignSlice::Slice
00033 objects, at least one by species involved in the alignments. For instance, if the reference Slice is a
00034 human slice and the set of alignments corresponds to human-mouse BLASTZ_NET alignments, there will be at
00035 least one Bio::EnsEMBL::Compara::AlignSlice::Slice for human and at least another one for mouse. The main
00036 Bio::EnsEMBL::Compara::AlignSlice::Slice for the reference species will contain a single genomic sequence
00037 whilst the other Bio::EnsEMBL::Compara::AlignSlice::Slice objects might be made of several pieces of
00038 genomic sequences, depending on the set of alignments. Here is a graphical representation:
00039 
00040   ref.Slice    **************************************************************
00041   
00042   alignments    11111111111
00043                                2222222222222222
00044                                                      33333333333333333
00045 
00046   resulting Bio::EnsEMBL::Compara::AlignSlice:
00047   
00048   AS::Slice 1  **************************************************************
00049   AS::Slice 2  .11111111111....2222222222222222......33333333333333333.......
00050 
00051 MODES
00052 
00053 Two modes are currently available: condensed and expanded mode. The default mode is "condensed". In
00054 condensed mode, no gaps are allowed in the reference Slice which means that information about deletions
00055 in the reference species (i.e. insertions in the other species) are lost. On the other hand, the
00056 first Bio::EnsEMBL::Compara::AlignSlice::Slice object corresponds to the original Bio::EnsEMBL::Slice.
00057 
00058 In the expanded mode, the first Bio::EnsEMBL::Compara::AlignSlice::Slice is expanded in order to
00059 accomodate the gaps corresponding to the deletions (insertions). Bear in mind that in expanded mode, the
00060 length of the resulting AlignSlice will be most probably larger than the length of the reference
00061 Bio::EnsEMBL::Slice.
00062 
00063 
00064 OVERLAPPING ALIGNMENTS
00065 
00066 No overlapping alignments are allowed by default. This means that if an alignment overlaps another one, the
00067 second alignment is ignored. This is due to lack of information needed to reconciliate both alignment.
00068 Here is a graphical example showing this problem:
00069 
00070   ALN 1:   Human (ref) CTGTGAAAA----CCCCATTAGG
00071            Mouse (1)     CTGAAAATTTTCCCC
00072   
00073   ALN 2:   Human (ref) CTGTGAAAA---CCCCATTAGG
00074            Mouse (1)         AAAGGGCCCCATTA
00075 
00076   Possible solution 1:
00077            Human (ref) CTGTGAAAA----CCCCATTAGG
00078            Mouse (1)     CTGAAAATTTTCCCC----
00079            Mouse (1)     ----AAA-GGGCCCCATTA
00080 
00081   Possible solution 2:
00082            Human (ref) CTGTGAAAA----CCCCATTAGG
00083            Mouse (1)     CTGAAAATTTTCCCC----
00084            Mouse (1)     ----AAAGGG-CCCCATTA
00085 
00086   Possible solution 3:
00087            Human (ref) CTGTGAAAA-------CCCCATTAGG
00088            Mouse (1)     CTGAAAATTTT---CCCC
00089            Mouse (1)         AAA----GGGCCCCATTA
00090   
00091 There is no easy way to find which of these possible solution is the best without trying to realign the
00092 three sequences together and this is far beyond the aim of this module. The best solution is to start with
00093 multiple alignments instead of overlapping pairwise ones.
00094 
00095 The third possibility is probably not the best alignment you can get although its implementation is
00096 systematic (insert as many gaps as needed in order to accommodate the insertions and never ever overlap
00097 them) and computationally cheap as no realignment is needed. You may ask this module to solve overlapping
00098 alignments in this way using the "solve_overlapping" option.
00099 
00100 RESULT
00101 
00102 The AlignSlice results in a set of Bio::EnsEMBL::Compara::AlignSlice::Slice which are objects really
00103 similar to the Bio::EnsEMBL::Slice. There are intended to be used just like the genuine
00104 Bio::EnsEMBL::Slice but some methods do not work though. Some examples of non-ported methods are: expand()
00105 and invert(). Some other methods work as expected (seq, subseq, get_all_Attributes,
00106 get_all_VariationFeatures, get_all_RepeatFeatures...). All these Bio::EnsEMBL::Compara::AlignSlice::Slice
00107 share the same fake coordinate system defined by the Bio::EnsEMBL::Compara::AlignSlice. This allows to
00108 map features from one species onto the others.
00109   
00110 =head1 SYNOPSIS
00111   
00112   use Bio::EnsEMBL::Compara::AlignSlice;
00113   
00114   ## You may create your own AlignSlice objects but if you are interested in
00115   ## getting AlignSlice built with data from an EnsEMBL Compara database you
00116   ## should consider using the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
00117   ## instead
00118   my $align_slice = new Bio::EnsEMBL::Compara::AlignSlice(
00119           -adaptor => $align_slice_adaptor,
00120           -reference_Slice => $reference_Slice,
00121           -method_link_species_set => $all_genomic_align_blocks,
00122           -genomicAlignBlocks => $all_genomic_align_blocks,
00123           -expanded => 1
00124           -solve_overlapping => 1
00125       );
00126 
00127   my $all_slices = $aling_slice->get_all_Slices();
00128   foreach my $this_slice (@$all_slices) {
00129     ## See also Bio::EnsEMBL::Compara::AlignSlice::Slice
00130     my $species_name = $this_slice->genome_db->name()
00131     my $all_mapped_genes = $this_slice->get_all_Genes();
00132   }
00133 
00134   my $simple_align = $align_slice->get_projected_SimpleAlign();
00135 
00136 SET VALUES
00137   $align_slice->adaptor($align_slice_adaptor);
00138   $align_slice->reference_Slice($reference_slice);
00139   $align_slice->add_GenomicAlignBlock($genomic_align_block_1);
00140   $align_slice->add_GenomicAlignBlock($genomic_align_block_2);
00141   $align_slice->add_GenomicAlignBlock($genomic_align_block_3);
00142 
00143 GET VALUES
00144   my $align_slice_adaptor = $align_slice->adaptor();
00145   my $reference_slice = $align_slice->reference_Slice();
00146   my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlock();
00147   my $mapped_genes = $align_slice->get_all_Genes_by_genome_db_id(3);
00148   my $simple_align = $align_slice->get_projected_SimpleAlign();
00149 
00150 =head1 OBJECT ATTRIBUTES
00151 
00152 =over
00153 
00154 =item adaptor
00155 
00156 Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object to access DB
00157 
00158 =item reference_slice
00159 
00160 Bio::EnsEMBL::Slice object used to create this Bio::EnsEBML::Compara::AlignSlice object
00161 
00162 =item all_genomic_align_blocks
00163 
00164 a listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects found using the reference_Slice
00165 
00166 =back
00167 
00168 =head1 APPENDIX
00169 
00170 The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
00171 
00172 =cut
00173 
00174 
00175 # Let the code begin...
00176 
00177 
00178 package Bio::EnsEMBL::Compara::AlignSlice;
00179 
00180 use strict;
00181 use Bio::EnsEMBL::Utils::Argument qw(rearrange);
00182 use Bio::EnsEMBL::Utils::Exception qw(throw warning info verbose);
00183 use Bio::EnsEMBL::Compara::AlignSlice::Exon;
00184 use Bio::EnsEMBL::Compara::AlignSlice::Slice;
00185 use Bio::EnsEMBL::Compara::GenomicAlignBlock;
00186 use Bio::EnsEMBL::Compara::GenomicAlign;
00187 use Bio::SimpleAlign;
00188 
00189 ## Creates a new coordinate system for creating empty Slices.
00190 
00191 my $aligngap_coord_system = new Bio::EnsEMBL::CoordSystem(
00192         -NAME => 'alignment',
00193         -VERSION => "none",
00194         -TOP_LEVEL => 0,
00195         -SEQUENCE_LEVEL => 1,
00196         -RANK => 1,
00197     );
00198 
00199 =head2 new (CONSTRUCTOR)
00200 
00201   Arg[1]     : a reference to a hash where keys can be:
00202                  -adaptor
00203                  -reference_slice
00204                  -genomic_align_blocks
00205                  -method_link_species_set
00206                  -expanded
00207                  -solve_overlapping
00208                  -preserve_blocks
00209     -adaptor:  the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
00210     -reference_slice:
00211                Bio::EnsEMBL::Slice, the guide slice for this align_slice
00212     -genomic_align_blocks:
00213                listref of Bio::EnsEMBL::Compara::GenomicAlignBlock
00214                objects containing to the alignments to be used for this
00215                align_slice
00216     -method_link_species_set;
00217                Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object for
00218                all the previous genomic_align_blocks. At the moment all
00219                the blocks should correspond to the same MethodLinkSpeciesSet
00220     -expanded: boolean flag. If set to true, the AlignSlice will insert all
00221                the gaps requiered by the alignments in the reference_slice
00222                (see MODES elsewhere in this document)
00223     -solve_overlapping:
00224                boolean flag. If set to true, the AlignSlice will allow
00225                overlapping alginments and solve indeterminations according
00226                to the method described in OVERLAPPING ALIGNMENTS elsewhere
00227                in this document
00228     -preserve_blocks:
00229                boolean flag. By default the AlignSlice trim the alignments
00230                in order to fit the reference_slice. This flags tell the
00231                AlignSlice to use the alignment block as they are (usually
00232                this is only used by the AlignSliceAdaptor, use with care)
00233   Example    : my $align_slice =
00234                    new Bio::EnsEMBL::Compara::AlignSlice(
00235                        -adaptor => $align_slice_adaptor,
00236                        -reference_slice => $reference_slice,
00237                        -method_link_species_set => $method_link_species_set,
00238                        -genomic_align_blocks => [$gab1, $gab2],
00239                        -expanded => 1
00240                    );
00241   Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object
00242   Returntype : Bio::EnsEMBL::Compara::AlignSlice object
00243   Exceptions : none
00244   Caller     : general
00245 
00246 =cut
00247 
00248 sub new {
00249   my ($class, @args) = @_;
00250 
00251   my $self = {};
00252   bless $self,$class;
00253 
00254   my ($adaptor, $reference_slice, $genomic_align_blocks, $genomic_align_trees,
00255       $method_link_species_set, $expanded, $solve_overlapping, $preserve_blocks,
00256       $species_order) =
00257       rearrange([qw(
00258           ADAPTOR REFERENCE_SLICE GENOMIC_ALIGN_BLOCKS GENOMIC_ALIGN_TREES
00259           METHOD_LINK_SPECIES_SET EXPANDED SOLVE_OVERLAPPING PRESERVE_BLOCKS
00260           SPECIES_ORDER
00261       )], @args);
00262 
00263   $self->adaptor($adaptor) if (defined ($adaptor));
00264   $self->reference_Slice($reference_slice) if (defined ($reference_slice));
00265   if (defined($genomic_align_blocks)) {  
00266     foreach my $this_genomic_align_block (@$genomic_align_blocks) {
00267       $self->add_GenomicAlignBlock($this_genomic_align_block);
00268     }
00269   }
00270   $self->{_method_link_species_set} = $method_link_species_set if (defined($method_link_species_set));
00271 
00272   $self->{expanded} = 0;
00273   if ($expanded) {
00274     $self->{expanded} = 1;
00275   }
00276 
00277   $self->{solve_overlapping} = 0;
00278   if ($solve_overlapping) {
00279     $self->{solve_overlapping} = $solve_overlapping;
00280   }
00281 
00282   if ($genomic_align_trees) {
00283     $self->_create_underlying_Slices($genomic_align_trees, $self->{expanded},
00284         $self->{solve_overlapping}, $preserve_blocks, $species_order);
00285 
00286     #Awful hack to store the _alignslice_from and _alignslice_to on the 
00287     #GenomicAlignBlock for use in get_all_ConservationScores which uses 
00288     #GenomicAlignBlock and not GenomicAlignTree
00289     foreach my $tree (@$genomic_align_trees) {
00290         foreach my $block (@$genomic_align_blocks) {
00291 
00292             #my $gab_id = $tree->get_all_leaves->[0]->get_all_genomic_aligns_for_node->[0]->genomic_align_block_id;
00293         #Hope always have a reference_genomic_align. The above doesn't work because only the reference species
00294         #has the genomic_align_block_id set
00295             my $gab_id = $tree->reference_genomic_align->genomic_align_block_id;
00296         my $block_id = $block->dbID;
00297 
00298         #if the block has been restricted, need to look at original_dbID
00299         if (!defined $block_id) {
00300         $block_id = $block->{original_dbID};
00301         }
00302         my $tree_ref_ga = $tree->{reference_genomic_align};
00303         my $block_ref_ga = $block->{reference_genomic_align};
00304 
00305         #Need to check the ref_ga details not just the block id because
00306         #the original_dbID is not unique for 2x genome blocks
00307             if ($gab_id == $block_id && 
00308         $tree_ref_ga->dnafrag_start == $block_ref_ga->dnafrag_start && 
00309         $tree_ref_ga->dnafrag_end == $block_ref_ga->dnafrag_end && 
00310         $tree_ref_ga->dnafrag_strand == $block_ref_ga->dnafrag_strand) {
00311 
00312                 $block->{_alignslice_from} = $tree->{_alignslice_from};
00313                 $block->{_alignslice_to} = $tree->{_alignslice_to};
00314         }
00315         }
00316     }
00317   } else {
00318     $self->_create_underlying_Slices($genomic_align_blocks, $self->{expanded},
00319         $self->{solve_overlapping}, $preserve_blocks, $species_order);
00320   }
00321   return $self;
00322 }
00323 
00324 
00325 =head2 sub_AlignSlice
00326 
00327   Arg 1      : int $start
00328   Arg 2      : int $end
00329   Example    : my $sub_align_slice = $align_slice->sub_AlignSlice(10, 50);
00330   Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object
00331                corresponding to a sub region of this one
00332   Returntype : Bio::EnsEMBL::Compara::AlignSlice object
00333   Exceptions : return undef if no internal slices can be created (see
00334                Bio::EnsEMBL::Compara::AlignSlice::Slice->sub_Slice)
00335   Caller     : $align_slice
00336   Status     : Testing
00337 
00338 =cut
00339 
00340 sub sub_AlignSlice {
00341   my ($self, $start, $end) = @_;
00342   my $sub_align_slice = {};
00343 
00344   throw("Must provide START argument") if (!defined($start));
00345   throw("Must provide END argument") if (!defined($end));
00346 
00347   bless $sub_align_slice, ref($self);
00348 
00349   $sub_align_slice->adaptor($self->adaptor);
00350   $sub_align_slice->reference_Slice($self->reference_Slice);
00351   foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) {
00352     $sub_align_slice->add_GenomicAlignBlock($this_genomic_align_block);
00353   }
00354   $sub_align_slice->{_method_link_species_set} = $self->{_method_link_species_set};
00355 
00356   $sub_align_slice->{expanded} = $self->{expanded};
00357 
00358   $sub_align_slice->{solve_overlapping} = $self->{solve_overlapping};
00359 
00360   foreach my $this_slice (@{$self->get_all_Slices}) {
00361     my $new_slice = $this_slice->sub_Slice($start, $end);
00362     push(@{$sub_align_slice->{_slices}}, $new_slice) if ($new_slice);
00363   }
00364   return undef if (!$sub_align_slice->{_slices});
00365 
00366   return $sub_align_slice;
00367 }
00368 
00369 
00370 =head2 adaptor
00371 
00372   Arg[1]     : (optional) Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor $align_slice_adaptor
00373   Example    : my $align_slice_adaptor = $align_slice->adaptor
00374   Example    : $align_slice->adaptor($align_slice_adaptor)
00375   Description: getter/setter for the adaptor attribute
00376   Returntype : Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object
00377   Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
00378   Caller     : $object->methodname
00379 
00380 =cut
00381 
00382 sub adaptor {
00383   my ($self, $adaptor) = @_;
00384 
00385   if (defined($adaptor)) {
00386     throw "[$adaptor] must be a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object"
00387         unless ($adaptor and $adaptor->isa("Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor"));
00388     $self->{'adaptor'} = $adaptor;
00389   }
00390 
00391   return $self->{'adaptor'};
00392 }
00393 
00394 
00395 =head2 get_all_Slices (experimental)
00396 
00397   Arg[1]     : [optional] string $species_name1
00398   Arg[2]     : [optional] string $species_name2
00399   Arg[...]   : [optional] string $species_nameN
00400   Example    : my $slices = $align_slice->get_all_Slices
00401   Description: getter for all the Slices in this AlignSlice. If a list
00402                of species is specified, returns only the slices for
00403                these species. The slices are returned in a "smart"
00404                order, i.e. the slice corresponding to the reference
00405                species is returned first and then the remaining slices
00406                depending on their phylogenetic distance to the first
00407                one.
00408                NB: You can use underscores instead of whitespaces for
00409                the name of the species, i.e. Homo_sapiens will be
00410                understood as "Homo sapiens". However if the GenomeDB is found
00411                to already have _ defined in the name then this behaviour is
00412                disabled.
00413   Returntype : listref of Bio::EnsEMBL::Compara::AlignSlice::Slice
00414                objects.
00415   Exceptions : 
00416   Caller     : $object->methodname
00417 
00418 =cut
00419 
00420 sub get_all_Slices {
00421   my ( $self, @species_names ) = @_;
00422   my $slices = [];
00423 
00424   if (@species_names) {
00425     foreach my $slice ( @{ $self->{_slices} } ) {
00426       #Substitute _ for spaces & check if the current GenomeDB matches with 
00427       #or without them
00428       foreach my $this_species_name (@species_names) {
00429         ( my $space_species_name = $this_species_name ) =~ s/_/ /g;
00430         push( @$slices, $slice )
00431           if ( ( $this_species_name eq $slice->genome_db->name )
00432           || ( $space_species_name eq $slice->genome_db->name ) );
00433       }
00434     }
00435   }
00436   else {
00437     $slices = $self->{_slices};
00438   }
00439 
00440   return $slices;
00441 }
00442 
00443 
00444 =head2 reference_Slice
00445 
00446   Arg[1]     : (optional) Bio::EnsEMBL::Slice $slice
00447   Example    : my $reference_slice = $align_slice->reference_slice
00448   Example    : $align_slice->reference_Slice($reference_slice)
00449   Description: getter/setter for the attribute reference_slice
00450   Returntype : Bio::EnsEMBL::Slice object
00451   Exceptions : throw if arg is not a Bio::EnsEMBL::Slice object
00452   Caller     : $object->methodname
00453 
00454 =cut
00455 
00456 sub reference_Slice {
00457   my ($self, $reference_slice) = @_;
00458 
00459   if (defined($reference_slice)) {
00460     throw "[$reference_slice] must be a Bio::EnsEMBL::Slice object"
00461         unless ($reference_slice and $reference_slice->isa("Bio::EnsEMBL::Slice"));
00462     $self->{'reference_slice'} = $reference_slice;
00463   }
00464 
00465   return $self->{'reference_slice'};
00466 }
00467 
00468 
00469 =head2 add_GenomicAlignBlock
00470 
00471   Arg[1]     : Bio::EnsEMBL::Compara::GenomicAlignBlock $genomicAlignBlock
00472   Example    : $align_slice->add_GenomicAlignBlock($genomicAlignBlock)
00473   Description: add a Bio::EnsEMBL::Compara::GenomicAlignBlock object to the array
00474                stored in the attribute all_genomic_align_blocks
00475   Returntype : none
00476   Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::GenomicAlignBlock
00477   Caller     : $object->methodname
00478 
00479 =cut
00480 
00481 sub add_GenomicAlignBlock {
00482   my ($self, $genomic_align_block) = @_;
00483 
00484   if (!defined($genomic_align_block)) {
00485     throw "Too few arguments for Bio::EnsEMBL::Compara::AlignSlice->add_GenomicAlignBlock()";
00486   }
00487   if (!$genomic_align_block or !$genomic_align_block->isa("Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
00488     throw "[$genomic_align_block] must be a Bio::EnsEMBL::Compara::GenomicAlignBlock object";
00489   }
00490 
00491   push(@{$self->{'all_genomic_align_blocks'}}, $genomic_align_block);
00492 }
00493 
00494 
00495 =head2 get_all_GenomicAlignBlocks
00496 
00497   Arg[1]     : none
00498   Example    : my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlocks
00499   Description: getter for the attribute all_genomic_align_blocks
00500   Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
00501   Exceptions : none
00502   Caller     : $object->methodname
00503 
00504 =cut
00505 
00506 sub get_all_GenomicAlignBlocks {
00507   my ($self) = @_;
00508 
00509   return ($self->{'all_genomic_align_blocks'} || []);
00510 }
00511 
00512 
00513 =head2 get_MethodLinkSpeciesSet
00514 
00515   Arg[1]     : none
00516   Example    : my $method_link_species_set = $align_slice->get_MethodLinkSpeciesSet
00517   Description: getter for the Bio::EnsEMBL::Compara::MethodLinkSpeciesSet
00518                used to create this object
00519   Returntype : Bio::EnsEMBL::Compara::MethodLinkSpeciesSet
00520   Exceptions : none
00521   Caller     : $object->methodname
00522 
00523 =cut
00524 
00525 sub get_MethodLinkSpeciesSet {
00526   my ($self) = @_;
00527 
00528   return $self->{'_method_link_species_set'};
00529 }
00530 
00531 
00532 =head2 get_SimpleAlign
00533 
00534   Arg[1]      : none
00535   Example     : use Bio::AlignIO;
00536                 my $out = Bio::AlignIO->newFh(-fh=>\*STDOUT, -format=> "clustalw");
00537                 print $out $align_slice->get_SimpleAlign();
00538   Description : This method creates a Bio::SimpleAlign object using the
00539                 Bio::EnsEMBL::Compara::AlignSlice::Slices underlying this
00540                 Bio::EnsEMBL::Compara::AlignSlice object. The SimpleAlign
00541                 describes the alignment where the first sequence
00542                 corresponds to the reference Slice and the remaining
00543                 correspond to the other species.
00544   Returntype  : Bio::SimpleAlign object
00545   Exceptions  : 
00546   Caller      : $object->methodname
00547 
00548 =cut
00549 
00550 sub get_SimpleAlign {
00551   my ($self, @species) = @_;
00552   my $simple_align;
00553 
00554   ## Create a single Bio::SimpleAlign for the projection  
00555   $simple_align = Bio::SimpleAlign->new();
00556   $simple_align->id("ProjectedMultiAlign");
00557 
00558   my $genome_db_name_counter;
00559   foreach my $slice (@{$self->get_all_Slices(@species)}) {
00560     my $seq = Bio::LocatableSeq->new(
00561             -SEQ    => $slice->seq,
00562             -START  => $slice->start,
00563             -END    => $slice->end,
00564             -ID     => $slice->genome_db->name.($genome_db_name_counter->{$slice->genome_db->name} or ""),
00565             -STRAND => $slice->strand
00566         );
00567     ## This allows to have several sequences for the same species. Bio::SimpleAlign complains
00568     ## about having the same ID, START and END for two sequences...
00569     if (!defined($genome_db_name_counter->{$slice->genome_db->name})) {
00570       $genome_db_name_counter->{$slice->genome_db->name} = 2;
00571     } else {
00572       $genome_db_name_counter->{$slice->genome_db->name}++;
00573     }
00574 
00575     $simple_align->add_seq($seq);
00576   }
00577 
00578   return $simple_align;
00579 }
00580 
00581 
00582 =head2 get_all_ConservationScores
00583 
00584   Arg  1     : (opt) integer $display_size (default 700)
00585   Arg  2     : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
00586   Arg  3     : (opt) integer $window_size
00587   Example    : my $conservation_scores =
00588                     $align_slice->get_all_ConservationScores(1000, "AVERAGE", 10);
00589   Description: Retrieve the corresponding
00590                Bio::EnsEMBL::Compara::ConservationScore objects for the
00591                Bio::EnsEMBL::Compara::AlignSlice object. It calls either
00592                _get_expanded_conservation_scores if the AlignSlice has 
00593                "expanded" set or _get_condensed_conservation_scores for 
00594                condensed mode.
00595                It sets up the align_start, align_end and slice_length and map 
00596                the resulting objects onto the AlignSlice. 
00597                Please refer to the documentation in
00598                Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
00599                for more details.
00600   Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 
00601                objects.
00602   Caller     : object::methodname
00603   Status     : At risk
00604 
00605 =cut
00606 
00607 sub get_all_ConservationScores {
00608   my ($self, $display_size, $display_type, $window_size) = @_;
00609   my $all_conservation_scores = [];
00610   my $y_axis_min;
00611   my $y_axis_max;
00612 
00613   my $conservation_score_adaptor = $self->adaptor->db->get_ConservationScoreAdaptor();
00614    
00615   #Get scores in either expanded or condensed mode
00616   if ($self->{expanded}) {
00617       $all_conservation_scores = $self->_get_expanded_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size);
00618   } else {
00619       $all_conservation_scores = $self->_get_condensed_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size);
00620   }
00621 
00622   return $all_conservation_scores;
00623 }
00624 
00625 =head2 _get_expanded_conservation_scores
00626 
00627   Arg  1     : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
00628   Arg  2     : (opt) integer $display_size (default 700)
00629   Arg  3     : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
00630   Arg  4     : (opt) integer $window_size
00631   Example    : my $conservation_scores =
00632                     $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10);
00633   Description: Retrieve the corresponding
00634                Bio::EnsEMBL::Compara::ConservationScore objects for the
00635                Bio::EnsEMBL::Compara::GenomicAlignBlock objects underlying
00636                this Bio::EnsEMBL::Compara::AlignSlice object. This method
00637                calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor->
00638                fetch_all_by_GenomicAlignBlock() method. It sets up the align_start,
00639                align_end and slice_length and map the resulting objects onto
00640                the AlignSlice. $diaplay_slize, $display_type and $window_size
00641                are passed to the fetch_all_by_GenomicAlignBlock() method.
00642                Please refer to the documentation in
00643                Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
00644                for more details.
00645   Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 
00646                objects.
00647   Caller     : object::methodname
00648   Status     : At risk
00649 
00650 =cut
00651 sub _get_expanded_conservation_scores {
00652     my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_;
00653     my $y_axis_min;
00654     my $y_axis_max;
00655 
00656     my $all_conservation_scores = [];
00657     foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) {
00658     $this_genomic_align_block->{restricted_aln_start} = $this_genomic_align_block->{_alignslice_from};
00659     $this_genomic_align_block->{restricted_aln_end} = $this_genomic_align_block->{_alignslice_to};
00660 
00661     my $all_these_conservation_scores = $conservation_score_adaptor->fetch_all_by_GenomicAlignBlock(
00662                                                     $this_genomic_align_block, 1,$this_genomic_align_block->length, $self->get_all_Slices()->[0]->length, 
00663                                                     $display_size, $display_type, $window_size);
00664     
00665     #initialise y axis min and max
00666     if (!defined $y_axis_max) {
00667         $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max;
00668         $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min;
00669     }
00670     #find overall min and max 
00671     if ($y_axis_min > $all_these_conservation_scores->[0]->y_axis_min) {
00672         $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min;
00673     }
00674     if ($y_axis_max < $all_these_conservation_scores->[0]->y_axis_max) {
00675         $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max;
00676     }
00677     push (@$all_conservation_scores, @$all_these_conservation_scores);
00678     }
00679     #set overall min and max
00680     $all_conservation_scores->[0]->y_axis_min($y_axis_min);
00681     $all_conservation_scores->[0]->y_axis_max($y_axis_max);
00682 
00683     return $all_conservation_scores;
00684 }
00685 
00686 =head2 _get_condensed_conservation_scores
00687 
00688   Arg  1     : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
00689   Arg  2     : (opt) integer $display_size (default 700)
00690   Arg  3     : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
00691   Arg  4     : (opt) integer $window_size
00692   Example    : my $conservation_scores =
00693                     $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10);
00694   Description: Retrieve the corresponding
00695                Bio::EnsEMBL::Compara::ConservationScore objects for the
00696                reference Bio::EnsEMBL::Slice object of 
00697                this Bio::EnsEMBL::Compara::AlignSlice object. This method
00698                calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor->
00699                fetch_all_by_MethodLinkSpeciesSet_Slice() method. It sets up 
00700                the align_start, align_end and slice_length and map the 
00701                resulting objects onto the AlignSlice. $display_slize, 
00702                $display_type and $window_size are passed to the 
00703                fetch_all_by_MethodLinkSpeciesSet_Slice() method.
00704                Please refer to the documentation in
00705                Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
00706                for more details.
00707   Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 
00708                objects.
00709   Caller     : object::methodname
00710   Status     : At risk
00711 
00712 =cut
00713 sub _get_condensed_conservation_scores {
00714     my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_;
00715 
00716     my $all_conservation_scores = [];
00717 
00718     throw ("Must have method_link_species_set defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{_method_link_species_set});
00719     throw ("Must have reference slice defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{'reference_slice'});
00720 
00721     #really hacky to get the mlss for the conservation score (which then uses 
00722     #it to get the multiple alignment mlss, which is what I started with!
00723     my $key = "gerp_%";
00724     my $sql = "SELECT meta_key FROM meta WHERE meta_key like ? AND meta_value = ?";
00725     my $sth = $conservation_score_adaptor->prepare($sql);
00726     $sth->execute($key, $self->{_method_link_species_set}->dbID);
00727     
00728     my $cs_mlss_id;
00729     while (my $arrRef = $sth->fetchrow_arrayref) {
00730     ($cs_mlss_id) = $arrRef->[0] =~ /gerp_(\d+)/;
00731     }
00732     $sth->finish;
00733 
00734     throw ("Unable to find conservation score method_link_species_set for this multiple alignment " . $self->{_method_link_species_set}->dbID) if (!defined $cs_mlss_id);
00735     my $mlss_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor();
00736 
00737     my $cs_mlss = $mlss_adaptor->fetch_by_dbID($cs_mlss_id);
00738 
00739     $all_conservation_scores = $conservation_score_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($cs_mlss, $self->{'reference_slice'}, $display_size, $display_type, $window_size);
00740     
00741     return $all_conservation_scores;
00742 }
00743 
00744 
00745 =head2 get_all_ConstrainedElements
00746 
00747   Arg  1     : (opt) string $method_link_type (default = GERP_CONSTRAINED_ELEMENT)
00748   Arg  2     : (opt) listref Bio::EnsEMBL::Compara::GenomeDB $species_set
00749                (default, the set of species from the MethodLinkSpeciesSet used
00750                to build this AlignSlice)
00751   Example    : my $constrained_elements =
00752                     $align_slice->get_all_ConstrainedElements();
00753   Description: Retrieve the corresponding constrained elements for these alignments.
00754                Objects will be located on this AlignSlice, i.e. the
00755                reference_slice, reference_slice_start, reference_slice_end
00756                and reference_slice_strand will refer to this AlignSlice
00757                object
00758   Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConstrainedElement
00759                objects.
00760   Caller     : object::methodname
00761   Status     : At risk
00762 
00763 =cut
00764 
00765 sub get_all_ConstrainedElements {
00766   my ($self, $method_link_type, $species_set) = @_;
00767   my $all_constrained_elements = [];
00768 
00769   $method_link_type ||= "GERP_CONSTRAINED_ELEMENT";
00770   my $key_cache = "_constrained_elements_".$method_link_type;
00771   if ($species_set) {
00772     $key_cache .= "::" . join("-", sort map {s/\W/_/g} map {$_->name} @$species_set);
00773   } else {
00774     $species_set = $self->{_method_link_species_set}->species_set;
00775   }
00776 
00777   if (!defined($self->{$key_cache})) {
00778     my $method_link_species_set_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor();
00779     my $method_link_species_set = $method_link_species_set_adaptor->fetch_by_method_link_type_GenomeDBs(
00780         $method_link_type, $self->{_method_link_species_set}->species_set);
00781 
00782     if ($method_link_species_set) {
00783       my $constrained_element_adaptor = $self->adaptor->db->get_ConstrainedElementAdaptor();
00784       $all_constrained_elements = $constrained_element_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice(
00785           $method_link_species_set, $self->reference_Slice);
00786       my $big_mapper = $self->{_reference_Mapper};
00787       foreach my $this_constrained_element (@{$all_constrained_elements}) {
00788         my $reference_slice_start;
00789         my $reference_slice_end;
00790         my $reference_slice_strand;
00791 
00792         my @alignment_coords = $big_mapper->map_coordinates(
00793             "sequence", # $self->genomic_align->dbID,
00794             $this_constrained_element->start + $this_constrained_element->slice->start - 1,
00795             $this_constrained_element->end + $this_constrained_element->slice->start - 1,
00796             $this_constrained_element->strand,
00797             "sequence" # $from_mapper->from
00798         );
00799         foreach my $alignment_coord (@alignment_coords) {
00800           next if (!$alignment_coord->isa("Bio::EnsEMBL::Mapper::Coordinate"));
00801           if (!defined($reference_slice_strand)) {
00802             $reference_slice_start = $alignment_coord->start;
00803             $reference_slice_end = $alignment_coord->end;
00804             $reference_slice_strand = $alignment_coord->strand;
00805           } else {
00806             if ($alignment_coord->start < $reference_slice_start) {
00807               $reference_slice_start = $alignment_coord->start;
00808             }
00809             if ($alignment_coord->end > $reference_slice_end) {
00810               $reference_slice_end = $alignment_coord->end;
00811             }
00812           }
00813         }
00814         $this_constrained_element->slice($self);
00815         $this_constrained_element->start($reference_slice_start);
00816         $this_constrained_element->end($reference_slice_end);
00817         $this_constrained_element->strand($reference_slice_strand);
00818       }
00819     }
00820     $self->{$key_cache} = $all_constrained_elements;
00821   }
00822 
00823   return $self->{$key_cache};
00824 }
00825 
00826 
00827 =head2 _create_underlying_Slices (experimental)
00828 
00829   Arg[1]     : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks
00830                $genomic_align_blocks
00831   Arg[2]     : [optional] boolean $expanded (default = FALSE)
00832   Arg[3]     : [optional] boolean $solve_overlapping (default = FALSE)
00833   Arg[4]     : [optional] boolean $preserve_blocks (default = FALSE)
00834   Example    : 
00835   Description: Creates a set of Bio::EnsEMBL::Compara::AlignSlice::Slices
00836                and attach it to this object. 
00837   Returntype : 
00838   Exceptions : warns about overlapping GenomicAlignBlocks
00839   Caller     : 
00840 
00841 =cut
00842 
00843 sub _create_underlying_Slices {
00844   my ($self, $genomic_align_blocks, $expanded, $solve_overlapping, $preserve_blocks, $species_order) = @_;
00845   my $strand = $self->reference_Slice->strand;
00846   
00847   my $align_slice_length = 0;
00848   my $last_ref_pos;
00849   if ($strand == 1) {
00850     $last_ref_pos = $self->reference_Slice->start;
00851   } else {
00852     $last_ref_pos = $self->reference_Slice->end;
00853   }
00854 
00855   my $ref_genome_db = $self->adaptor->db->get_GenomeDBAdaptor->fetch_by_Slice($self->reference_Slice);
00856   my $big_mapper = Bio::EnsEMBL::Mapper->new("sequence", "alignment");
00857 
00858   my $sorted_genomic_align_blocks;
00859 
00860   if ($solve_overlapping eq "restrict") {
00861     $sorted_genomic_align_blocks = _sort_and_restrict_GenomicAlignBlocks($genomic_align_blocks);
00862   } elsif ($solve_overlapping) {
00863     $sorted_genomic_align_blocks = _sort_and_compile_GenomicAlignBlocks($genomic_align_blocks);
00864   } else {
00865     $sorted_genomic_align_blocks = _sort_GenomicAlignBlocks($genomic_align_blocks);
00866   }
00867   @$sorted_genomic_align_blocks = reverse(@$sorted_genomic_align_blocks) if ($strand == -1);
00868   foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) {
00869     my $original_genomic_align_block = $this_genomic_align_block;
00870     my ($from, $to);
00871 
00872     if ($preserve_blocks) {
00873       ## Don't restrict the block. Set from and to to 1 and length respectively
00874       $from = 1;
00875       $to = $this_genomic_align_block->length;
00876     } else {
00877     #need to check that the block is still overlapping the slice - it may
00878     #have already been restricted by the options above.
00879         if ($this_genomic_align_block->reference_genomic_align->dnafrag_start > $self->reference_Slice->end || $this_genomic_align_block->reference_genomic_align->dnafrag_end < $self->reference_Slice->start) {
00880             next;
00881         }
00882       ($this_genomic_align_block, $from, $to) = $this_genomic_align_block->restrict_between_reference_positions(
00883           $self->reference_Slice->start, $self->reference_Slice->end);
00884     }
00885 
00886     $original_genomic_align_block->{_alignslice_from} = $from;
00887     $original_genomic_align_block->{_alignslice_to} = $to;
00888 
00889     my $reference_genomic_align = $this_genomic_align_block->reference_genomic_align;
00890 
00891     #If I haven't needed to restrict, I don't gain this link so add it here
00892     if (!defined $reference_genomic_align->genomic_align_block->reference_genomic_align) {
00893     $reference_genomic_align->genomic_align_block($this_genomic_align_block);
00894     }
00895 
00896     my ($this_pos, $this_gap_between_genomic_align_blocks);
00897     if ($strand == 1) {
00898       $this_pos = $reference_genomic_align->dnafrag_start;
00899       $this_gap_between_genomic_align_blocks = $this_pos - $last_ref_pos;
00900     } else {
00901       $this_pos = $reference_genomic_align->dnafrag_end;
00902       $this_gap_between_genomic_align_blocks = $last_ref_pos - $this_pos;
00903     }
00904 
00905     if ($this_gap_between_genomic_align_blocks > 0) {
00906       ## Add mapper info for inter-genomic_align_block space
00907       if ($strand == 1) {
00908         $big_mapper->add_map_coordinates(
00909                 'sequence',
00910                 $last_ref_pos,
00911                 $this_pos - 1,
00912                 $strand,
00913                 'alignment',
00914                 $align_slice_length + 1,
00915                 $align_slice_length + $this_gap_between_genomic_align_blocks,
00916             );
00917       } else {
00918         $big_mapper->add_map_coordinates(
00919                 'sequence',
00920                 $this_pos + 1,
00921                 $last_ref_pos,
00922                 $strand,
00923                 'alignment',
00924                 $align_slice_length + 1,
00925                 $align_slice_length + $this_gap_between_genomic_align_blocks,
00926             );
00927       }
00928       $align_slice_length += $this_gap_between_genomic_align_blocks;
00929     }
00930     $reference_genomic_align->genomic_align_block->start($align_slice_length + 1);
00931     $original_genomic_align_block->{_alignslice_start} = $align_slice_length;
00932     if ($expanded) {
00933       $align_slice_length += CORE::length($reference_genomic_align->aligned_sequence("+FAKE_SEQ"));
00934       $reference_genomic_align->genomic_align_block->end($align_slice_length);
00935       $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0));
00936     } else {
00937       $align_slice_length += $reference_genomic_align->dnafrag_end - $reference_genomic_align->dnafrag_start + 1;
00938       $reference_genomic_align->genomic_align_block->end($align_slice_length);
00939       $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0,1));
00940     }
00941     $reference_genomic_align->genomic_align_block->slice($self);
00942 
00943     if ($strand == 1) {
00944       $last_ref_pos = $reference_genomic_align->dnafrag_end + 1;
00945     } else {
00946       $last_ref_pos = $reference_genomic_align->dnafrag_start - 1;
00947     }
00948   }
00949   my ($this_pos, $this_gap_between_genomic_align_blocks);
00950   if ($strand == 1) {
00951     $this_pos = $self->reference_Slice->end;
00952     $this_gap_between_genomic_align_blocks = $this_pos - ($last_ref_pos - 1);
00953   } else {
00954     $this_pos = $self->reference_Slice->start;
00955     $this_gap_between_genomic_align_blocks = $last_ref_pos + 1 - $this_pos;
00956   }
00957 
00958   ## $last_ref_pos is the next nucleotide position after the last mapped one.
00959   if ($this_gap_between_genomic_align_blocks > 0) {
00960     if ($strand == 1) {
00961       $big_mapper->add_map_coordinates(
00962               'sequence',
00963               $last_ref_pos,
00964               $this_pos,
00965               $strand,
00966               'alignment',
00967               $align_slice_length + 1,
00968               $align_slice_length + $this_gap_between_genomic_align_blocks,
00969           );
00970     } else {
00971       $big_mapper->add_map_coordinates(
00972               'sequence',
00973               $this_pos,
00974               $last_ref_pos,
00975               $strand,
00976               'alignment',
00977               $align_slice_length + 1,
00978               $align_slice_length + $this_gap_between_genomic_align_blocks,
00979           );
00980     }
00981     $align_slice_length += $this_gap_between_genomic_align_blocks;
00982   }
00983 
00984   if ($species_order) {
00985     foreach my $species_def (@$species_order) {
00986       my $genome_db_name = $species_def->{genome_db}->name;
00987 # print STDERR "SPECIES:: ", $genome_db_name, "\n";
00988       my $new_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
00989               -length => $align_slice_length,
00990               -requesting_slice => $self->reference_Slice,
00991               -align_slice => $self,
00992               -method_link_species_set => $self->{_method_link_species_set},
00993               -genome_db => $species_def->{genome_db},
00994               -expanded => $expanded,
00995           );
00996       $new_slice->{genomic_align_ids} = $species_def->{genomic_align_ids};
00997       push(@{$self->{slices}->{lc($genome_db_name)}}, $new_slice);
00998       push(@{$self->{_slices}}, $new_slice);
00999     }
01000   } else {
01001 # print STDERR "SPECIES:: ", $ref_genome_db->name, "\n";
01002     $self->{slices}->{lc($ref_genome_db->name)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice(
01003             -length => $align_slice_length,
01004             -requesting_slice => $self->reference_Slice,
01005             -align_slice => $self,
01006             -method_link_species_set => $self->{_method_link_species_set},
01007             -genome_db => $ref_genome_db,
01008             -expanded => $expanded,
01009         )];
01010     $self->{_slices} = [$self->{slices}->{lc($ref_genome_db->name)}->[0]];
01011   }
01012 
01013   $self->{slices}->{lc($ref_genome_db->name)}->[0]->add_Slice_Mapper_pair(
01014           $self->reference_Slice,
01015           $big_mapper,
01016           1,
01017           $align_slice_length,
01018           $self->reference_Slice->strand
01019       );
01020   $self->{_reference_Mapper} = $big_mapper;
01021 
01022   foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) {
01023     if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
01024       ## For trees, loop through all nodes (internal and leaves) to add the GenomicAligns
01025       foreach my $this_genomic_align_node (@{$this_genomic_align_block->get_all_nodes}) {
01026         # but we have to skip the reference node as this has already been added to the guide Slice
01027         next if ($this_genomic_align_node eq $this_genomic_align_block->reference_genomic_align_node);
01028 
01029         # For composite segments (2X genomes), the node will link to several GenomicAligns.
01030         # Add each of them to one of the AS:Slice objects
01031         foreach my $this_genomic_align (@{$this_genomic_align_node->get_all_genomic_aligns_for_node}) {
01032           # Link to genomic_align_block may have been lost during tree minimization
01033           $this_genomic_align->genomic_align_block_id(0);
01034           $this_genomic_align->genomic_align_block($this_genomic_align_block);
01035           $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block,
01036               $species_order, $align_slice_length);
01037         }
01038       }
01039     } else {
01040       ## For plain alignments, just use all non-reference GenomicAlign objects
01041       foreach my $this_genomic_align
01042               (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
01043         $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block,
01044             $species_order, $align_slice_length);
01045       }
01046     }
01047   }
01048   #It is possible for the same region in the ref species (eg Gibbon) to align to 2 different blocks (pairwise to human in the case of the EPO low coverage alignments). In this case, although the incoming  $genomic_align_blocks will have 2 blocks, the $sorted_genomic_align_blocks will only contain 1 of the blocks. It may happen that one species occurs in one block (eg gorilla) but not in the other. However, the $species_order will contain gorilla but the $sorted_genomic_align_blocks may not. This results in a slice being created for gorilla but it has no slice_mapper_pairs. Must check the slices and remove any that have {'slice_mapper_pairs'} as undef (no alignment comes out as a GAP).
01049 
01050   if ($species_order) {
01051       my $slices = $self->{_slices};
01052       for (my $i = (@$slices-1); $i >= 0; --$i) {
01053       if (@{$slices->[$i]->{'slice_mapper_pairs'}} == 0) {
01054           #remove from {slices}
01055           delete $self->{slices}->{$slices->[$i]->genome_db->name};
01056           #remove from {_slices}
01057           splice @$slices, $i, 1;
01058       }
01059       }
01060   }
01061 
01062 
01063   
01064   return $self;
01065 }
01066 
01067 =head2 _add_GenomicAlign_to_a_Slice
01068 
01069 =cut
01070 
01071 sub _add_GenomicAlign_to_a_Slice {
01072   my ($self, $this_genomic_align, $this_genomic_align_block, $species_order, $align_slice_length) = @_;
01073 
01074   my $expanded = $self->{expanded};
01075   my $species = $this_genomic_align->dnafrag->genome_db->name;
01076 
01077   if (!defined($self->{slices}->{lc($species)})) {
01078     $self->{slices}->{lc($species)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice(
01079             -length => $align_slice_length,
01080             -requesting_slice => $self->reference_Slice,
01081             -align_slice => $self,
01082             -method_link_species_set => $self->{_method_link_species_set},
01083             -genome_db => $this_genomic_align->dnafrag->genome_db,
01084             -expanded => $expanded,
01085         )];
01086     push(@{$self->{_slices}}, $self->{slices}->{lc($species)}->[0]);
01087   }
01088 
01089   my $this_block_start = $this_genomic_align_block->start;
01090   my $this_block_end = $this_genomic_align_block->end;
01091   my $this_core_slice = $this_genomic_align->get_Slice();
01092   if (!$this_core_slice) {
01093     $this_core_slice = new Bio::EnsEMBL::Slice(
01094           -coord_system => $aligngap_coord_system,
01095           -seq_region_name => "GAP",
01096           -start => $this_block_start,
01097           -end => $this_block_end,
01098           -strand => 0
01099         );
01100     $this_core_slice->{seq} = "." x ($this_block_end - $this_block_start + 1);
01101   }
01102   return if (!$this_core_slice); ## The restriction of the GenomicAlignBlock may return a void GenomicAlign
01103 
01104   ## This creates a link between the slice and the tree node. This is required to display
01105   ## the tree on the web interface.
01106   if ($this_genomic_align->genome_db->name eq "ancestral_sequences") {
01107     foreach my $genomic_align_node (@{$this_genomic_align_block->get_all_sorted_genomic_align_nodes}) {
01108       my $genomic_align_group = $genomic_align_node->genomic_align_group;
01109       next if (!$genomic_align_group);
01110 
01111       foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
01112         if ($this_genomic_align == $genomic_align) {
01113           my $simple_tree = $genomic_align_node->newick_simple_format();
01114           $simple_tree =~ s/\_[^\_]+\_\d+\_\d+\[[\+\-]\]//g;
01115           $simple_tree =~ s/\:[\d\.]+//g;
01116           $this_core_slice->{_tree} = $simple_tree;
01117           last;
01118         }
01119       }
01120     }
01121   }
01122 
01123   my $this_mapper = $this_genomic_align->get_Mapper(0, !$expanded);
01124   # Fix block start and block end for composite segments (2X genomes)
01125   if ($this_genomic_align->cigar_line =~ /^(\d*)X/ or $this_genomic_align->cigar_line =~ /(\d*)X$/) {
01126     $this_block_start = undef;
01127     $this_block_end = undef;
01128     my @blocks = $this_mapper->map_coordinates("sequence", $this_genomic_align->dnafrag_start,
01129           $this_genomic_align->dnafrag_end, $this_genomic_align->dnafrag_strand, "sequence");
01130     foreach my $this_block (@blocks) {
01131       next if ($this_block->isa("Bio::EnsEMBL::Mapper::Gap"));
01132       $this_block_start = $this_block->start if (!defined($this_block_start) or $this_block->start < $this_block_start);
01133       $this_block_end = $this_block->end if (!defined($this_block_end) or $this_block->end > $this_block_end);
01134     }
01135   }
01136 
01137   # Choose the appropriate AS::Slice for adding this bit of the alignment
01138   my $this_underlying_slice = $self->_choose_underlying_Slice($this_genomic_align, $this_block_start,
01139       $this_block_end, $align_slice_length, $species_order);
01140 
01141   # Add a Slice, Mapper, and start-end-strand coordinates to an underlying AS::Slice
01142   $this_underlying_slice->add_Slice_Mapper_pair(
01143           $this_core_slice,
01144           $this_mapper,
01145           $this_block_start,
01146           $this_block_end,
01147           $this_genomic_align->dnafrag_strand
01148       );
01149   return;
01150 }
01151 
01152 
01153 sub _choose_underlying_Slice {
01154   my ($self, $this_genomic_align, $this_block_start, $this_block_end, $align_slice_length, $species_order) = @_;
01155   my $underlying_slice = undef;
01156 
01157   my $expanded = $self->{expanded};
01158   my $species = $this_genomic_align->dnafrag->genome_db->name;
01159 
01160   if (defined($this_genomic_align->{_temporary_AS_underlying_Slice})) {
01161     my $preset_underlying_slice = $this_genomic_align->{_temporary_AS_underlying_Slice};
01162     delete($this_genomic_align->{_temporary_AS_underlying_Slice});
01163     return $preset_underlying_slice;
01164   }
01165 
01166   if (!defined($self->{slices}->{lc($species)})) {
01167     ## No slice for this species yet. Create, store and return it
01168     $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
01169             -length => $align_slice_length,
01170             -requesting_slice => $self->reference_Slice,
01171             -align_slice => $self,
01172             -method_link_species_set => $self->{_method_link_species_set},
01173             -genome_db => $this_genomic_align->dnafrag->genome_db,
01174             -expanded => $expanded,
01175         );
01176     push(@{$self->{_slices}}, $underlying_slice);
01177     push(@{$self->{slices}->{lc($species)}}, $underlying_slice);
01178     return $underlying_slice;
01179   }
01180 
01181   if ($species_order) {
01182     my $preset_underlying_slice = undef;
01183     foreach my $this_underlying_slice (@{$self->{_slices}}) {
01184       if (!$this_genomic_align->{original_dbID} and $this_genomic_align->dbID) {
01185         $this_genomic_align->{original_dbID} = $this_genomic_align->dbID;
01186       }
01187       if (grep {$_ == $this_genomic_align->{original_dbID}}
01188           @{$this_underlying_slice->{genomic_align_ids}}) {
01189         $preset_underlying_slice = $this_underlying_slice;
01190       }
01191     }
01192     if ($preset_underlying_slice) {
01193       my $overlap = 0;
01194       my $slice_mapper_pairs = $preset_underlying_slice->get_all_Slice_Mapper_pairs();
01195       foreach my $slice_mapper_pair (@$slice_mapper_pairs) {
01196         my $block_start = $slice_mapper_pair->{start};
01197         my $block_end = $slice_mapper_pair->{end};
01198     #a block may not have a start and end if there is no sequence
01199     #eg the cigar_line looks like 139D17186X
01200     next if (!defined $this_block_start || !defined $this_block_end);
01201         if ($this_block_start <= $block_end and $this_block_end >= $block_start) {
01202           $overlap = 1;
01203           last;
01204         }
01205       }
01206       if (!$overlap) {
01207         ## This block does not overlap any previous block: add it!
01208         $underlying_slice = $preset_underlying_slice;
01209       }
01210     }
01211   }
01212 
01213   if (!$underlying_slice) {
01214     ## Try to add this alignment to an existing underlying Bio::EnsEMBL::Compara::AlignSlice::Slice
01215     SLICE: foreach my $this_underlying_slice (@{$self->{slices}->{lc($species)}}) {
01216       my $slice_mapper_pairs = $this_underlying_slice->get_all_Slice_Mapper_pairs();
01217       PAIRS: foreach my $slice_mapper_pair (@$slice_mapper_pairs) {
01218         my $block_start = $slice_mapper_pair->{start};
01219         my $block_end = $slice_mapper_pair->{end};
01220         if ($this_block_start <= $block_end and $this_block_end >= $block_start) {
01221           next SLICE; ## This block overlaps a previous block
01222         }
01223       }
01224       ## This block does not overlap any previous block: add it!
01225       $underlying_slice = $this_underlying_slice;
01226     }
01227   }
01228 
01229   if (!$underlying_slice) {
01230     ## This block overlaps at least one block in every available underlying
01231     ## Bio::EnsEMBL::Compara::AlignSlice::Slice. Create a new one!
01232     $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
01233             -length => $align_slice_length,
01234             -requesting_slice => $self->reference_Slice,
01235             -align_slice => $self,
01236             -method_link_species_set => $self->{_method_link_species_set},
01237             -genome_db => $this_genomic_align->dnafrag->genome_db,
01238             -expanded => $expanded,
01239         );
01240     push(@{$self->{_slices}}, $underlying_slice);
01241     push(@{$self->{slices}->{lc($species)}}, $underlying_slice);
01242   }
01243 
01244 #   if ($this_genomic_align->cigar_line =~ /X/) {
01245 #     ## This GenomicAlign is part of a composite alignment
01246 #     my $genomic_align_group = $this_genomic_align->genomic_align_group_by_type("composite");
01247 #     foreach my $this_genomic_align (@{$genomic_align_group->genomic_align_array}) {
01248 #     #  next if ($this_genomic_align 
01249 #     }
01250 #   }
01251 
01252   return $underlying_slice;
01253 }
01254 
01255 
01256 =head2 _sort_and_restrict_GenomicAlignBlocks
01257 
01258   Arg[1]      : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
01259   Example     : $sorted_gabs = _sort_GenomicAlignBlocks($gabs);
01260   Description : This method returns the original list of
01261                 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order
01262   Returntype  : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
01263   Exceptions  : 
01264   Caller      : methodname()
01265 
01266 =cut
01267 
01268 sub _sort_and_restrict_GenomicAlignBlocks {
01269   my ($genomic_align_blocks) = @_;
01270   my $sorted_genomic_align_blocks = [];
01271   return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
01272 
01273   my $last_end;
01274   foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) {
01275     if (defined($last_end) and
01276         $this_genomic_align_block->reference_genomic_align->dnafrag_start <= $last_end) {
01277       if ($this_genomic_align_block->reference_genomic_align->dnafrag_end > $last_end) {
01278         $this_genomic_align_block = $this_genomic_align_block->restrict_between_reference_positions($last_end + 1, undef);
01279       } else {
01280       warning("Ignoring GenomicAlignBlock because it overlaps".
01281                 " previous GenomicAlignBlock");
01282         next;
01283       }
01284     }
01285     $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
01286     push(@$sorted_genomic_align_blocks, $this_genomic_align_block);
01287   }
01288 
01289   return $sorted_genomic_align_blocks;
01290 }
01291 
01292 =head2 _sort_GenomicAlignBlocks
01293 
01294   Arg[1]      : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
01295   Example     : $sorted_gabs = _sort_GenomicAlignBlocks($gabs);
01296   Description : This method returns the original list of
01297                 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order
01298   Returntype  : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
01299   Exceptions  : 
01300   Caller      : methodname()
01301 
01302 =cut
01303 
01304 sub _sort_GenomicAlignBlocks {
01305   my ($genomic_align_blocks) = @_;
01306   my $sorted_genomic_align_blocks = [];
01307   return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
01308 
01309   my $last_end;
01310   foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) {
01311     if (!defined($last_end) or
01312         $this_genomic_align_block->reference_genomic_align->dnafrag_start > $last_end) {
01313       push(@$sorted_genomic_align_blocks, $this_genomic_align_block);
01314       $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
01315     } else {
01316     my $block_id;
01317     if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
01318         warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignBlock #".
01319             ($this_genomic_align_block->dbID or "-unknown")." because it overlaps".
01320             " previous Bio::EnsEMBL::Compara::GenomicAlignBlock");
01321     } else {
01322         warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignTree #".
01323             ($this_genomic_align_block->node_id or "-unknown")." because it overlaps".
01324             " previous Bio::EnsEMBL::Compara::GenomicAlignTree");
01325     }
01326 
01327     }
01328   }
01329 
01330   return $sorted_genomic_align_blocks;
01331 }
01332 
01333 sub _sort_gabs {
01334 
01335     if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
01336     _sort_genomic_align_block();
01337     } else {
01338     _sort_genomic_align_tree();
01339     }
01340 }
01341 
01342 sub _sort_genomic_align_block {
01343     
01344     if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) {
01345     ## This may happen when a block has been split into small pieces and some of them contain
01346     ## gaps only for the reference species. In this case, use another species for sorting these
01347     ## genomic_align_blocks
01348     for (my $i = 0; $i<@{$a->get_all_non_reference_genomic_aligns()}; $i++) {
01349         for (my $j = 0; $j<@{$b->get_all_non_reference_genomic_aligns()}; $j++) {
01350         next if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_id !=
01351              $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_id);
01352         if (($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start !=
01353              $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start) and
01354             ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand ==
01355              $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_strand)) {
01356             ## This other genomic_align is not a full gap and ca be used to sort these blocks
01357             if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand == 1) {
01358             return $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start <=> 
01359               $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start;
01360             } else {
01361             return $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start <=> 
01362               $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start;
01363             }
01364         }
01365         }
01366     }
01367     } else {
01368     return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start
01369     }
01370 }
01371 
01372 sub _sort_genomic_align_tree {
01373 
01374     if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) {
01375     ## This may happen when a block has been split into small pieces and some of them contain
01376     ## gaps only for the reference species. In this case, use another species for sorting these
01377     ## genomic_align_blocks
01378     my $a_leaves = $a->get_all_leaves;
01379     my $b_leaves = $b->get_all_leaves;
01380     
01381     for (my $i = 0; $i < @$a_leaves; $i++) {
01382         for (my $j = 0; $j < @$b_leaves; $j++) {
01383         #look at high coverage sequences only
01384         my $a_gas = $a_leaves->[$i]->get_all_genomic_aligns_for_node;
01385         next if (@$a_gas > 1);
01386         my $a_ga = $a_gas->[0];
01387         
01388         my $b_gas = $b_leaves->[$j]->get_all_genomic_aligns_for_node;
01389         next if (@$b_gas > 1);
01390         my $b_ga = $b_gas->[0];
01391         
01392         next if ($a_ga->dnafrag_id != $b_ga->dnafrag_id);
01393         if (($a_ga->dnafrag_start != $b_ga->dnafrag_start) and ($a_ga->dnafrag_strand == $b_ga->dnafrag_strand)) {
01394             ## This other genomic_align is not a full gap and ca be used to sort these blocks
01395             if ($a_ga->dnafrag_strand == 1) {
01396             return $a_ga->dnafrag_start <=> $b_ga->dnafrag_start;
01397             } else {
01398             return $b_ga->dnafrag_start <=> $a_ga->dnafrag_start;
01399             }
01400         }
01401         }
01402     }
01403     } else {
01404     return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start
01405     }
01406 }
01407 
01408 =head2 _sort_and_compile_GenomicAlignBlocks
01409 
01410   Arg[1]      : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
01411   Example     : $sorted_fake_gabs = _sort_and_compile_GenomicAlignBlocks($gabs);
01412   Description : This method returns a list of
01413                 Bio::EnsEMBL::Compara::GenomicAlignBlock objects sorted by
01414                 position on the reference Bio::EnsEMBL::Compara::DnaFrag. If two
01415                 or more Bio::EnsEMBL::Compara::GenomicAlignBlock objects
01416                 overlap, it compile them, using the _compile_GenomicAlignBlocks
01417                 method.
01418   Returntype  : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
01419   Exceptions  : 
01420   Caller      : methodname()
01421 
01422 =cut
01423 
01424 sub _sort_and_compile_GenomicAlignBlocks {
01425   my ($genomic_align_blocks) = @_;
01426   my $sorted_genomic_align_blocks = [];
01427   return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
01428 
01429   ##############################################################################################
01430   ##
01431   ## Compile GenomicAlignBlocks in group of GenomicAlignBlocks based on reference coordinates
01432   ##
01433   my $sets_of_genomic_align_blocks = [];
01434   my $start_pos;
01435   my $end_pos;
01436   my $this_set_of_genomic_align_blocks = [];
01437   foreach my $this_genomic_align_block (sort _sort_gabs @$genomic_align_blocks) {
01438     my $this_start_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_start;
01439     my $this_end_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
01440     if (defined($end_pos) and ($this_start_pos <= $end_pos)) {
01441       # this genomic_align_block overlaps previous one. Extend this set_of_coordinates
01442       $end_pos = $this_end_pos if ($this_end_pos > $end_pos);
01443     } else {
01444       # there is a gap between this genomic_align_block and the previous one. Close and save
01445       # this set_of_genomic_align_blocks (if it exists) and start a new one.
01446       push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks])
01447           if (defined(@$this_set_of_genomic_align_blocks));
01448       $start_pos = $this_start_pos;
01449       $end_pos = $this_end_pos;
01450       $this_set_of_genomic_align_blocks = [];
01451     }
01452     push(@$this_set_of_genomic_align_blocks, $this_genomic_align_block);
01453   }
01454   push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks])
01455         if (defined(@$this_set_of_genomic_align_blocks));
01456   ##
01457   ##############################################################################################
01458 
01459   foreach my $this_set_of_genomic_align_blocks (@$sets_of_genomic_align_blocks) {
01460     my $this_compiled_genomic_align_block;
01461     if (@$this_set_of_genomic_align_blocks == 1) {
01462       $this_compiled_genomic_align_block = $this_set_of_genomic_align_blocks->[0];
01463     } else {
01464       $this_compiled_genomic_align_block =
01465           _compile_GenomicAlignBlocks(@$this_set_of_genomic_align_blocks);
01466     }
01467     push(@{$sorted_genomic_align_blocks}, $this_compiled_genomic_align_block);
01468   }
01469 
01470   return $sorted_genomic_align_blocks;
01471 }
01472 
01473 =head2 _compile_GenomicAlignBlocks
01474 
01475   Arg [1]     : integer $start_pos (the start of the fake genomic_align)
01476   Arg [2]     : integer $end_pos (the end of the fake genomic_align)
01477   Arg [3]     : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $set_of_genomic_align_blocks
01478                 $all_genomic_align_blocks (the pairwise genomic_align_blocks used for
01479                 this fake multiple genomic_aling_block)
01480   Example     : 
01481   Description : 
01482   Returntype  : Bio::EnsEMBL::Compara::GenomicAlignBlock object
01483   Exceptions  : 
01484   Caller      : methodname
01485 
01486 =cut
01487 
01488 sub _compile_GenomicAlignBlocks {
01489   my ($start_pos, $end_pos, $all_genomic_align_blocks) = @_;
01490 
01491   ############################################################################################
01492   ##
01493   ## Change strands in order to have all reference genomic aligns on the forward strand
01494   ##
01495   my $strand;
01496   foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01497     my $this_genomic_align = $this_genomic_align_block->reference_genomic_align;
01498     if (!defined($strand)) {
01499       $strand = $this_genomic_align->dnafrag_strand;
01500     } elsif ($strand != $this_genomic_align->dnafrag_strand) {
01501       $strand = 0;
01502     }
01503     if ($this_genomic_align->dnafrag_strand == -1) {
01504 
01505     if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
01506         foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) {
01507         my $genomic_align_group = $this_node->genomic_align_group;
01508         next if (!$genomic_align_group);
01509         foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
01510             $genomic_align->reverse_complement;
01511         }
01512         }
01513     } else {
01514         foreach my $genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
01515         $genomic_align->reverse_complement;
01516         }
01517     }
01518     }
01519   }
01520   ##
01521   ############################################################################################
01522 
01523   ## Nothing has to be compiled if there is one single GenomicAlignBlock!
01524   $all_genomic_align_blocks->[0]->reverse_complement;
01525   return $all_genomic_align_blocks->[0] if (scalar(@$all_genomic_align_blocks) == 1);
01526 
01527   ############################################################################################
01528   ##
01529   ## Fix all sequences
01530   ##
01531   foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01532     my $this_genomic_align = $this_genomic_align_block->reference_genomic_align;
01533     my $this_start_pos = $this_genomic_align->dnafrag_start;
01534     my $this_end_pos = $this_genomic_align->dnafrag_end;
01535     my $starting_gap = $this_start_pos - $start_pos;
01536     my $ending_gap = $end_pos - $this_end_pos;
01537 
01538     my $this_cigar_line = $this_genomic_align->cigar_line;
01539     my $this_original_sequence = $this_genomic_align->original_sequence;
01540     $this_genomic_align->aligned_sequence("");
01541     if ($starting_gap) {
01542       $this_cigar_line = $starting_gap."M".$this_cigar_line;
01543       $this_original_sequence = ("N" x $starting_gap).$this_original_sequence;
01544     }
01545     if ($ending_gap) {
01546       $this_cigar_line .= $ending_gap."M";
01547       $this_original_sequence .= ("N" x $ending_gap);
01548     }
01549     $this_genomic_align->cigar_line($this_cigar_line);
01550     $this_genomic_align->original_sequence($this_original_sequence);
01551     
01552     foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
01553       $this_genomic_align->aligned_sequence("");
01554       my $this_cigar_line = $this_genomic_align->cigar_line;
01555       $this_cigar_line = $starting_gap."D".$this_cigar_line if ($starting_gap);
01556       $this_cigar_line .= $ending_gap."D" if ($ending_gap);
01557       $this_genomic_align->cigar_line($this_cigar_line);
01558       $this_genomic_align->aligned_sequence(); # compute aligned_sequence using cigar_line
01559     }
01560   }
01561   ##
01562   ############################################################################################
01563 
01564   ############################################################################################
01565   ##
01566   ## Distribute gaps
01567   ##
01568   my $aln_pos = 0;
01569   my $gap;
01570   do {
01571     my $gap_pos;
01572     my $genomic_align_block;
01573     $gap = undef;
01574 
01575     ## Get the (next) first gap from all the alignments (sets: $gap_pos, $gap and $genomic_align_block_id)
01576     foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01577       my $this_gap_pos = index($this_genomic_align_block->reference_genomic_align->aligned_sequence,
01578           "-", $aln_pos);
01579       if ($this_gap_pos > 0 and (!defined($gap_pos) or $this_gap_pos < $gap_pos)) {
01580         $gap_pos = $this_gap_pos;
01581         my $gap_string = substr($this_genomic_align_block->reference_genomic_align->aligned_sequence,
01582             $gap_pos);
01583         ($gap) = $gap_string =~ /^(\-+)/;
01584         $genomic_align_block = $this_genomic_align_block;
01585       }
01586     }
01587 
01588     ## If a gap has been found, apply it to the other GAB
01589     if ($gap) {
01590       $aln_pos = $gap_pos + length($gap);
01591       foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01592         next if ($genomic_align_block eq $this_genomic_align_block); # Do not add gap to itself!!
01593     if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
01594         foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) {
01595         my $genomic_align_group = $this_node->genomic_align_group;
01596         next if (!$genomic_align_group);
01597         foreach my $this_genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
01598             # insert gap in the aligned_sequence
01599             my $aligned_sequence = $this_genomic_align->aligned_sequence;
01600             substr($aligned_sequence, $gap_pos, 0, $gap);
01601             $this_genomic_align->aligned_sequence($aligned_sequence);
01602         }
01603         }
01604     } else {
01605         foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
01606         # insert gap in the aligned_sequence
01607         my $aligned_sequence = $this_genomic_align->aligned_sequence;
01608         substr($aligned_sequence, $gap_pos, 0, $gap);
01609         $this_genomic_align->aligned_sequence($aligned_sequence);
01610         }
01611     }
01612       }
01613     }
01614     
01615   } while ($gap); # exit loop if no gap has been found
01616 
01617   ## Fix all cigar_lines in order to match new aligned_sequences
01618   foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01619     foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
01620       $this_genomic_align->cigar_line(""); # undef old cigar_line
01621       $this_genomic_align->cigar_line(); # compute cigar_line from aligned_sequence
01622     }
01623   }
01624   ##
01625   ############################################################################################
01626 
01627   ############################################################################################
01628   ##
01629   ##  Create the reference_genomic_align for this fake genomic_align_block
01630   ##
01631   ##  All the blocks have been edited and all the reference genomic_aling
01632   ##  should be equivalent. Here, we create a new one with no fixed sequence.
01633   ##  This permits to retrieve the real sequence when needed
01634   ##
01635   my $reference_genomic_align;
01636   if (@$all_genomic_align_blocks) {
01637     my $this_genomic_align = $all_genomic_align_blocks->[0]->reference_genomic_align;
01638     $reference_genomic_align = new Bio::EnsEMBL::Compara::GenomicAlign(
01639             -dbID => -1,
01640             -dnafrag => $this_genomic_align->dnafrag,
01641             -dnafrag_start => $start_pos,
01642             -dnafrag_end => $end_pos,
01643             -dnafrag_strand => 1,
01644             -cigar_line => $this_genomic_align->cigar_line,
01645             -method_link_species_set => $this_genomic_align->method_link_species_set,
01646             -visible => 1
01647         );
01648   }
01649   ##
01650   ############################################################################################
01651 
01652   ## Create the genomic_align_array (the list of genomic_aling for this fake gab
01653   my $genomic_align_array = [$reference_genomic_align];
01654   foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
01655     foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
01656       $this_genomic_align->genomic_align_block_id(0); # undef old genomic_align_block_id
01657       push(@$genomic_align_array, $this_genomic_align);
01658     }
01659   }
01660   
01661   ## Create the fake multiple Bio::EnsEMBL::Compara::GenomicAlignBlock
01662   my $fake_genomic_align_block = new Bio::EnsEMBL::Compara::GenomicAlignBlock(
01663           -length => ($end_pos - $start_pos + 1),
01664           -genomic_align_array => $genomic_align_array,
01665           -reference_genomic_align => $reference_genomic_align,
01666           -level_id => 0
01667       );
01668 
01669   if ($strand == -1) {
01670     $fake_genomic_align_block->reverse_complement;
01671   }
01672 
01673   return $fake_genomic_align_block;
01674 }
01675 
01676 
01677 sub DESTROY {
01678   my $self = shift;
01679   ## Remove circular reference in order to allow Perl to clear the object
01680   $self->{all_genomic_align_blocks} = undef;
01681 }
01682 
01683 1;