AlignSlice.pm
Go to the documentation of this file.
00001 =head1 LICENSE 00002 00003 Copyright (c) 1999-2012 The European Bioinformatics Institute and 00004 Genome Research Limited. All rights reserved. 00005 00006 This software is distributed under a modified Apache license. 00007 For license details, please see 00008 00009 http://www.ensembl.org/info/about/code_licence.html 00010 00011 =head1 CONTACT 00012 00013 Please email comments or questions to the public Ensembl 00014 developers list at <dev@ensembl.org>. 00015 00016 Questions may also be sent to the Ensembl help desk at 00017 <helpdesk@ensembl.org>. 00018 00019 =head1 NAME 00020 00021 Bio::EnsEMBL::Compara::AlignSlice - An AlignSlice can be used to map genes and features from one species onto another one 00022 00023 =head1 DESCRIPTION 00024 00025 INTRODUCTION 00026 00027 An AlignSlice is an object built with a reference Slice and the corresponding set of genomic alignments. 00028 The genomic alignments are used to map features from one species onto another and viceversa. 00029 00030 STRUCTURE 00031 00032 Every Bio::EnsEMBL::Compara::AlignSlice contains a set of Bio::EnsEMBL::Compara::AlignSlice::Slice 00033 objects, at least one by species involved in the alignments. For instance, if the reference Slice is a 00034 human slice and the set of alignments corresponds to human-mouse BLASTZ_NET alignments, there will be at 00035 least one Bio::EnsEMBL::Compara::AlignSlice::Slice for human and at least another one for mouse. The main 00036 Bio::EnsEMBL::Compara::AlignSlice::Slice for the reference species will contain a single genomic sequence 00037 whilst the other Bio::EnsEMBL::Compara::AlignSlice::Slice objects might be made of several pieces of 00038 genomic sequences, depending on the set of alignments. Here is a graphical representation: 00039 00040 ref.Slice ************************************************************** 00041 00042 alignments 11111111111 00043 2222222222222222 00044 33333333333333333 00045 00046 resulting Bio::EnsEMBL::Compara::AlignSlice: 00047 00048 AS::Slice 1 ************************************************************** 00049 AS::Slice 2 .11111111111....2222222222222222......33333333333333333....... 00050 00051 MODES 00052 00053 Two modes are currently available: condensed and expanded mode. The default mode is "condensed". In 00054 condensed mode, no gaps are allowed in the reference Slice which means that information about deletions 00055 in the reference species (i.e. insertions in the other species) are lost. On the other hand, the 00056 first Bio::EnsEMBL::Compara::AlignSlice::Slice object corresponds to the original Bio::EnsEMBL::Slice. 00057 00058 In the expanded mode, the first Bio::EnsEMBL::Compara::AlignSlice::Slice is expanded in order to 00059 accomodate the gaps corresponding to the deletions (insertions). Bear in mind that in expanded mode, the 00060 length of the resulting AlignSlice will be most probably larger than the length of the reference 00061 Bio::EnsEMBL::Slice. 00062 00063 00064 OVERLAPPING ALIGNMENTS 00065 00066 No overlapping alignments are allowed by default. This means that if an alignment overlaps another one, the 00067 second alignment is ignored. This is due to lack of information needed to reconciliate both alignment. 00068 Here is a graphical example showing this problem: 00069 00070 ALN 1: Human (ref) CTGTGAAAA----CCCCATTAGG 00071 Mouse (1) CTGAAAATTTTCCCC 00072 00073 ALN 2: Human (ref) CTGTGAAAA---CCCCATTAGG 00074 Mouse (1) AAAGGGCCCCATTA 00075 00076 Possible solution 1: 00077 Human (ref) CTGTGAAAA----CCCCATTAGG 00078 Mouse (1) CTGAAAATTTTCCCC---- 00079 Mouse (1) ----AAA-GGGCCCCATTA 00080 00081 Possible solution 2: 00082 Human (ref) CTGTGAAAA----CCCCATTAGG 00083 Mouse (1) CTGAAAATTTTCCCC---- 00084 Mouse (1) ----AAAGGG-CCCCATTA 00085 00086 Possible solution 3: 00087 Human (ref) CTGTGAAAA-------CCCCATTAGG 00088 Mouse (1) CTGAAAATTTT---CCCC 00089 Mouse (1) AAA----GGGCCCCATTA 00090 00091 There is no easy way to find which of these possible solution is the best without trying to realign the 00092 three sequences together and this is far beyond the aim of this module. The best solution is to start with 00093 multiple alignments instead of overlapping pairwise ones. 00094 00095 The third possibility is probably not the best alignment you can get although its implementation is 00096 systematic (insert as many gaps as needed in order to accommodate the insertions and never ever overlap 00097 them) and computationally cheap as no realignment is needed. You may ask this module to solve overlapping 00098 alignments in this way using the "solve_overlapping" option. 00099 00100 RESULT 00101 00102 The AlignSlice results in a set of Bio::EnsEMBL::Compara::AlignSlice::Slice which are objects really 00103 similar to the Bio::EnsEMBL::Slice. There are intended to be used just like the genuine 00104 Bio::EnsEMBL::Slice but some methods do not work though. Some examples of non-ported methods are: expand() 00105 and invert(). Some other methods work as expected (seq, subseq, get_all_Attributes, 00106 get_all_VariationFeatures, get_all_RepeatFeatures...). All these Bio::EnsEMBL::Compara::AlignSlice::Slice 00107 share the same fake coordinate system defined by the Bio::EnsEMBL::Compara::AlignSlice. This allows to 00108 map features from one species onto the others. 00109 00110 =head1 SYNOPSIS 00111 00112 use Bio::EnsEMBL::Compara::AlignSlice; 00113 00114 ## You may create your own AlignSlice objects but if you are interested in 00115 ## getting AlignSlice built with data from an EnsEMBL Compara database you 00116 ## should consider using the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor 00117 ## instead 00118 my $align_slice = new Bio::EnsEMBL::Compara::AlignSlice( 00119 -adaptor => $align_slice_adaptor, 00120 -reference_Slice => $reference_Slice, 00121 -method_link_species_set => $all_genomic_align_blocks, 00122 -genomicAlignBlocks => $all_genomic_align_blocks, 00123 -expanded => 1 00124 -solve_overlapping => 1 00125 ); 00126 00127 my $all_slices = $aling_slice->get_all_Slices(); 00128 foreach my $this_slice (@$all_slices) { 00129 ## See also Bio::EnsEMBL::Compara::AlignSlice::Slice 00130 my $species_name = $this_slice->genome_db->name() 00131 my $all_mapped_genes = $this_slice->get_all_Genes(); 00132 } 00133 00134 my $simple_align = $align_slice->get_projected_SimpleAlign(); 00135 00136 SET VALUES 00137 $align_slice->adaptor($align_slice_adaptor); 00138 $align_slice->reference_Slice($reference_slice); 00139 $align_slice->add_GenomicAlignBlock($genomic_align_block_1); 00140 $align_slice->add_GenomicAlignBlock($genomic_align_block_2); 00141 $align_slice->add_GenomicAlignBlock($genomic_align_block_3); 00142 00143 GET VALUES 00144 my $align_slice_adaptor = $align_slice->adaptor(); 00145 my $reference_slice = $align_slice->reference_Slice(); 00146 my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlock(); 00147 my $mapped_genes = $align_slice->get_all_Genes_by_genome_db_id(3); 00148 my $simple_align = $align_slice->get_projected_SimpleAlign(); 00149 00150 =head1 OBJECT ATTRIBUTES 00151 00152 =over 00153 00154 =item adaptor 00155 00156 Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object to access DB 00157 00158 =item reference_slice 00159 00160 Bio::EnsEMBL::Slice object used to create this Bio::EnsEBML::Compara::AlignSlice object 00161 00162 =item all_genomic_align_blocks 00163 00164 a listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects found using the reference_Slice 00165 00166 =back 00167 00168 =head1 APPENDIX 00169 00170 The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ 00171 00172 =cut 00173 00174 00175 # Let the code begin... 00176 00177 00178 package Bio::EnsEMBL::Compara::AlignSlice; 00179 00180 use strict; 00181 use Bio::EnsEMBL::Utils::Argument qw(rearrange); 00182 use Bio::EnsEMBL::Utils::Exception qw(throw warning info verbose); 00183 use Bio::EnsEMBL::Compara::AlignSlice::Exon; 00184 use Bio::EnsEMBL::Compara::AlignSlice::Slice; 00185 use Bio::EnsEMBL::Compara::GenomicAlignBlock; 00186 use Bio::EnsEMBL::Compara::GenomicAlign; 00187 use Bio::SimpleAlign; 00188 00189 ## Creates a new coordinate system for creating empty Slices. 00190 00191 my $aligngap_coord_system = new Bio::EnsEMBL::CoordSystem( 00192 -NAME => 'alignment', 00193 -VERSION => "none", 00194 -TOP_LEVEL => 0, 00195 -SEQUENCE_LEVEL => 1, 00196 -RANK => 1, 00197 ); 00198 00199 =head2 new (CONSTRUCTOR) 00200 00201 Arg[1] : a reference to a hash where keys can be: 00202 -adaptor 00203 -reference_slice 00204 -genomic_align_blocks 00205 -method_link_species_set 00206 -expanded 00207 -solve_overlapping 00208 -preserve_blocks 00209 -adaptor: the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor 00210 -reference_slice: 00211 Bio::EnsEMBL::Slice, the guide slice for this align_slice 00212 -genomic_align_blocks: 00213 listref of Bio::EnsEMBL::Compara::GenomicAlignBlock 00214 objects containing to the alignments to be used for this 00215 align_slice 00216 -method_link_species_set; 00217 Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object for 00218 all the previous genomic_align_blocks. At the moment all 00219 the blocks should correspond to the same MethodLinkSpeciesSet 00220 -expanded: boolean flag. If set to true, the AlignSlice will insert all 00221 the gaps requiered by the alignments in the reference_slice 00222 (see MODES elsewhere in this document) 00223 -solve_overlapping: 00224 boolean flag. If set to true, the AlignSlice will allow 00225 overlapping alginments and solve indeterminations according 00226 to the method described in OVERLAPPING ALIGNMENTS elsewhere 00227 in this document 00228 -preserve_blocks: 00229 boolean flag. By default the AlignSlice trim the alignments 00230 in order to fit the reference_slice. This flags tell the 00231 AlignSlice to use the alignment block as they are (usually 00232 this is only used by the AlignSliceAdaptor, use with care) 00233 Example : my $align_slice = 00234 new Bio::EnsEMBL::Compara::AlignSlice( 00235 -adaptor => $align_slice_adaptor, 00236 -reference_slice => $reference_slice, 00237 -method_link_species_set => $method_link_species_set, 00238 -genomic_align_blocks => [$gab1, $gab2], 00239 -expanded => 1 00240 ); 00241 Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object 00242 Returntype : Bio::EnsEMBL::Compara::AlignSlice object 00243 Exceptions : none 00244 Caller : general 00245 00246 =cut 00247 00248 sub new { 00249 my ($class, @args) = @_; 00250 00251 my $self = {}; 00252 bless $self,$class; 00253 00254 my ($adaptor, $reference_slice, $genomic_align_blocks, $genomic_align_trees, 00255 $method_link_species_set, $expanded, $solve_overlapping, $preserve_blocks, 00256 $species_order) = 00257 rearrange([qw( 00258 ADAPTOR REFERENCE_SLICE GENOMIC_ALIGN_BLOCKS GENOMIC_ALIGN_TREES 00259 METHOD_LINK_SPECIES_SET EXPANDED SOLVE_OVERLAPPING PRESERVE_BLOCKS 00260 SPECIES_ORDER 00261 )], @args); 00262 00263 $self->adaptor($adaptor) if (defined ($adaptor)); 00264 $self->reference_Slice($reference_slice) if (defined ($reference_slice)); 00265 if (defined($genomic_align_blocks)) { 00266 foreach my $this_genomic_align_block (@$genomic_align_blocks) { 00267 $self->add_GenomicAlignBlock($this_genomic_align_block); 00268 } 00269 } 00270 $self->{_method_link_species_set} = $method_link_species_set if (defined($method_link_species_set)); 00271 00272 $self->{expanded} = 0; 00273 if ($expanded) { 00274 $self->{expanded} = 1; 00275 } 00276 00277 $self->{solve_overlapping} = 0; 00278 if ($solve_overlapping) { 00279 $self->{solve_overlapping} = $solve_overlapping; 00280 } 00281 00282 if ($genomic_align_trees) { 00283 $self->_create_underlying_Slices($genomic_align_trees, $self->{expanded}, 00284 $self->{solve_overlapping}, $preserve_blocks, $species_order); 00285 00286 #Awful hack to store the _alignslice_from and _alignslice_to on the 00287 #GenomicAlignBlock for use in get_all_ConservationScores which uses 00288 #GenomicAlignBlock and not GenomicAlignTree 00289 foreach my $tree (@$genomic_align_trees) { 00290 foreach my $block (@$genomic_align_blocks) { 00291 00292 #my $gab_id = $tree->get_all_leaves->[0]->get_all_genomic_aligns_for_node->[0]->genomic_align_block_id; 00293 #Hope always have a reference_genomic_align. The above doesn't work because only the reference species 00294 #has the genomic_align_block_id set 00295 my $gab_id = $tree->reference_genomic_align->genomic_align_block_id; 00296 my $block_id = $block->dbID; 00297 00298 #if the block has been restricted, need to look at original_dbID 00299 if (!defined $block_id) { 00300 $block_id = $block->{original_dbID}; 00301 } 00302 my $tree_ref_ga = $tree->{reference_genomic_align}; 00303 my $block_ref_ga = $block->{reference_genomic_align}; 00304 00305 #Need to check the ref_ga details not just the block id because 00306 #the original_dbID is not unique for 2x genome blocks 00307 if ($gab_id == $block_id && 00308 $tree_ref_ga->dnafrag_start == $block_ref_ga->dnafrag_start && 00309 $tree_ref_ga->dnafrag_end == $block_ref_ga->dnafrag_end && 00310 $tree_ref_ga->dnafrag_strand == $block_ref_ga->dnafrag_strand) { 00311 00312 $block->{_alignslice_from} = $tree->{_alignslice_from}; 00313 $block->{_alignslice_to} = $tree->{_alignslice_to}; 00314 } 00315 } 00316 } 00317 } else { 00318 $self->_create_underlying_Slices($genomic_align_blocks, $self->{expanded}, 00319 $self->{solve_overlapping}, $preserve_blocks, $species_order); 00320 } 00321 return $self; 00322 } 00323 00324 00325 =head2 sub_AlignSlice 00326 00327 Arg 1 : int $start 00328 Arg 2 : int $end 00329 Example : my $sub_align_slice = $align_slice->sub_AlignSlice(10, 50); 00330 Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object 00331 corresponding to a sub region of this one 00332 Returntype : Bio::EnsEMBL::Compara::AlignSlice object 00333 Exceptions : return undef if no internal slices can be created (see 00334 Bio::EnsEMBL::Compara::AlignSlice::Slice->sub_Slice) 00335 Caller : $align_slice 00336 Status : Testing 00337 00338 =cut 00339 00340 sub sub_AlignSlice { 00341 my ($self, $start, $end) = @_; 00342 my $sub_align_slice = {}; 00343 00344 throw("Must provide START argument") if (!defined($start)); 00345 throw("Must provide END argument") if (!defined($end)); 00346 00347 bless $sub_align_slice, ref($self); 00348 00349 $sub_align_slice->adaptor($self->adaptor); 00350 $sub_align_slice->reference_Slice($self->reference_Slice); 00351 foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) { 00352 $sub_align_slice->add_GenomicAlignBlock($this_genomic_align_block); 00353 } 00354 $sub_align_slice->{_method_link_species_set} = $self->{_method_link_species_set}; 00355 00356 $sub_align_slice->{expanded} = $self->{expanded}; 00357 00358 $sub_align_slice->{solve_overlapping} = $self->{solve_overlapping}; 00359 00360 foreach my $this_slice (@{$self->get_all_Slices}) { 00361 my $new_slice = $this_slice->sub_Slice($start, $end); 00362 push(@{$sub_align_slice->{_slices}}, $new_slice) if ($new_slice); 00363 } 00364 return undef if (!$sub_align_slice->{_slices}); 00365 00366 return $sub_align_slice; 00367 } 00368 00369 00370 =head2 adaptor 00371 00372 Arg[1] : (optional) Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor $align_slice_adaptor 00373 Example : my $align_slice_adaptor = $align_slice->adaptor 00374 Example : $align_slice->adaptor($align_slice_adaptor) 00375 Description: getter/setter for the adaptor attribute 00376 Returntype : Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object 00377 Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor 00378 Caller : $object->methodname 00379 00380 =cut 00381 00382 sub adaptor { 00383 my ($self, $adaptor) = @_; 00384 00385 if (defined($adaptor)) { 00386 throw "[$adaptor] must be a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object" 00387 unless ($adaptor and $adaptor->isa("Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor")); 00388 $self->{'adaptor'} = $adaptor; 00389 } 00390 00391 return $self->{'adaptor'}; 00392 } 00393 00394 00395 =head2 get_all_Slices (experimental) 00396 00397 Arg[1] : [optional] string $species_name1 00398 Arg[2] : [optional] string $species_name2 00399 Arg[...] : [optional] string $species_nameN 00400 Example : my $slices = $align_slice->get_all_Slices 00401 Description: getter for all the Slices in this AlignSlice. If a list 00402 of species is specified, returns only the slices for 00403 these species. The slices are returned in a "smart" 00404 order, i.e. the slice corresponding to the reference 00405 species is returned first and then the remaining slices 00406 depending on their phylogenetic distance to the first 00407 one. 00408 NB: You can use underscores instead of whitespaces for 00409 the name of the species, i.e. Homo_sapiens will be 00410 understood as "Homo sapiens". However if the GenomeDB is found 00411 to already have _ defined in the name then this behaviour is 00412 disabled. 00413 Returntype : listref of Bio::EnsEMBL::Compara::AlignSlice::Slice 00414 objects. 00415 Exceptions : 00416 Caller : $object->methodname 00417 00418 =cut 00419 00420 sub get_all_Slices { 00421 my ( $self, @species_names ) = @_; 00422 my $slices = []; 00423 00424 if (@species_names) { 00425 foreach my $slice ( @{ $self->{_slices} } ) { 00426 #Substitute _ for spaces & check if the current GenomeDB matches with 00427 #or without them 00428 foreach my $this_species_name (@species_names) { 00429 ( my $space_species_name = $this_species_name ) =~ s/_/ /g; 00430 push( @$slices, $slice ) 00431 if ( ( $this_species_name eq $slice->genome_db->name ) 00432 || ( $space_species_name eq $slice->genome_db->name ) ); 00433 } 00434 } 00435 } 00436 else { 00437 $slices = $self->{_slices}; 00438 } 00439 00440 return $slices; 00441 } 00442 00443 00444 =head2 reference_Slice 00445 00446 Arg[1] : (optional) Bio::EnsEMBL::Slice $slice 00447 Example : my $reference_slice = $align_slice->reference_slice 00448 Example : $align_slice->reference_Slice($reference_slice) 00449 Description: getter/setter for the attribute reference_slice 00450 Returntype : Bio::EnsEMBL::Slice object 00451 Exceptions : throw if arg is not a Bio::EnsEMBL::Slice object 00452 Caller : $object->methodname 00453 00454 =cut 00455 00456 sub reference_Slice { 00457 my ($self, $reference_slice) = @_; 00458 00459 if (defined($reference_slice)) { 00460 throw "[$reference_slice] must be a Bio::EnsEMBL::Slice object" 00461 unless ($reference_slice and $reference_slice->isa("Bio::EnsEMBL::Slice")); 00462 $self->{'reference_slice'} = $reference_slice; 00463 } 00464 00465 return $self->{'reference_slice'}; 00466 } 00467 00468 00469 =head2 add_GenomicAlignBlock 00470 00471 Arg[1] : Bio::EnsEMBL::Compara::GenomicAlignBlock $genomicAlignBlock 00472 Example : $align_slice->add_GenomicAlignBlock($genomicAlignBlock) 00473 Description: add a Bio::EnsEMBL::Compara::GenomicAlignBlock object to the array 00474 stored in the attribute all_genomic_align_blocks 00475 Returntype : none 00476 Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::GenomicAlignBlock 00477 Caller : $object->methodname 00478 00479 =cut 00480 00481 sub add_GenomicAlignBlock { 00482 my ($self, $genomic_align_block) = @_; 00483 00484 if (!defined($genomic_align_block)) { 00485 throw "Too few arguments for Bio::EnsEMBL::Compara::AlignSlice->add_GenomicAlignBlock()"; 00486 } 00487 if (!$genomic_align_block or !$genomic_align_block->isa("Bio::EnsEMBL::Compara::GenomicAlignBlock")) { 00488 throw "[$genomic_align_block] must be a Bio::EnsEMBL::Compara::GenomicAlignBlock object"; 00489 } 00490 00491 push(@{$self->{'all_genomic_align_blocks'}}, $genomic_align_block); 00492 } 00493 00494 00495 =head2 get_all_GenomicAlignBlocks 00496 00497 Arg[1] : none 00498 Example : my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlocks 00499 Description: getter for the attribute all_genomic_align_blocks 00500 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects 00501 Exceptions : none 00502 Caller : $object->methodname 00503 00504 =cut 00505 00506 sub get_all_GenomicAlignBlocks { 00507 my ($self) = @_; 00508 00509 return ($self->{'all_genomic_align_blocks'} || []); 00510 } 00511 00512 00513 =head2 get_MethodLinkSpeciesSet 00514 00515 Arg[1] : none 00516 Example : my $method_link_species_set = $align_slice->get_MethodLinkSpeciesSet 00517 Description: getter for the Bio::EnsEMBL::Compara::MethodLinkSpeciesSet 00518 used to create this object 00519 Returntype : Bio::EnsEMBL::Compara::MethodLinkSpeciesSet 00520 Exceptions : none 00521 Caller : $object->methodname 00522 00523 =cut 00524 00525 sub get_MethodLinkSpeciesSet { 00526 my ($self) = @_; 00527 00528 return $self->{'_method_link_species_set'}; 00529 } 00530 00531 00532 =head2 get_SimpleAlign 00533 00534 Arg[1] : none 00535 Example : use Bio::AlignIO; 00536 my $out = Bio::AlignIO->newFh(-fh=>\*STDOUT, -format=> "clustalw"); 00537 print $out $align_slice->get_SimpleAlign(); 00538 Description : This method creates a Bio::SimpleAlign object using the 00539 Bio::EnsEMBL::Compara::AlignSlice::Slices underlying this 00540 Bio::EnsEMBL::Compara::AlignSlice object. The SimpleAlign 00541 describes the alignment where the first sequence 00542 corresponds to the reference Slice and the remaining 00543 correspond to the other species. 00544 Returntype : Bio::SimpleAlign object 00545 Exceptions : 00546 Caller : $object->methodname 00547 00548 =cut 00549 00550 sub get_SimpleAlign { 00551 my ($self, @species) = @_; 00552 my $simple_align; 00553 00554 ## Create a single Bio::SimpleAlign for the projection 00555 $simple_align = Bio::SimpleAlign->new(); 00556 $simple_align->id("ProjectedMultiAlign"); 00557 00558 my $genome_db_name_counter; 00559 foreach my $slice (@{$self->get_all_Slices(@species)}) { 00560 my $seq = Bio::LocatableSeq->new( 00561 -SEQ => $slice->seq, 00562 -START => $slice->start, 00563 -END => $slice->end, 00564 -ID => $slice->genome_db->name.($genome_db_name_counter->{$slice->genome_db->name} or ""), 00565 -STRAND => $slice->strand 00566 ); 00567 ## This allows to have several sequences for the same species. Bio::SimpleAlign complains 00568 ## about having the same ID, START and END for two sequences... 00569 if (!defined($genome_db_name_counter->{$slice->genome_db->name})) { 00570 $genome_db_name_counter->{$slice->genome_db->name} = 2; 00571 } else { 00572 $genome_db_name_counter->{$slice->genome_db->name}++; 00573 } 00574 00575 $simple_align->add_seq($seq); 00576 } 00577 00578 return $simple_align; 00579 } 00580 00581 00582 =head2 get_all_ConservationScores 00583 00584 Arg 1 : (opt) integer $display_size (default 700) 00585 Arg 2 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX") 00586 Arg 3 : (opt) integer $window_size 00587 Example : my $conservation_scores = 00588 $align_slice->get_all_ConservationScores(1000, "AVERAGE", 10); 00589 Description: Retrieve the corresponding 00590 Bio::EnsEMBL::Compara::ConservationScore objects for the 00591 Bio::EnsEMBL::Compara::AlignSlice object. It calls either 00592 _get_expanded_conservation_scores if the AlignSlice has 00593 "expanded" set or _get_condensed_conservation_scores for 00594 condensed mode. 00595 It sets up the align_start, align_end and slice_length and map 00596 the resulting objects onto the AlignSlice. 00597 Please refer to the documentation in 00598 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor 00599 for more details. 00600 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 00601 objects. 00602 Caller : object::methodname 00603 Status : At risk 00604 00605 =cut 00606 00607 sub get_all_ConservationScores { 00608 my ($self, $display_size, $display_type, $window_size) = @_; 00609 my $all_conservation_scores = []; 00610 my $y_axis_min; 00611 my $y_axis_max; 00612 00613 my $conservation_score_adaptor = $self->adaptor->db->get_ConservationScoreAdaptor(); 00614 00615 #Get scores in either expanded or condensed mode 00616 if ($self->{expanded}) { 00617 $all_conservation_scores = $self->_get_expanded_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size); 00618 } else { 00619 $all_conservation_scores = $self->_get_condensed_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size); 00620 } 00621 00622 return $all_conservation_scores; 00623 } 00624 00625 =head2 _get_expanded_conservation_scores 00626 00627 Arg 1 : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor 00628 Arg 2 : (opt) integer $display_size (default 700) 00629 Arg 3 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX") 00630 Arg 4 : (opt) integer $window_size 00631 Example : my $conservation_scores = 00632 $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10); 00633 Description: Retrieve the corresponding 00634 Bio::EnsEMBL::Compara::ConservationScore objects for the 00635 Bio::EnsEMBL::Compara::GenomicAlignBlock objects underlying 00636 this Bio::EnsEMBL::Compara::AlignSlice object. This method 00637 calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor-> 00638 fetch_all_by_GenomicAlignBlock() method. It sets up the align_start, 00639 align_end and slice_length and map the resulting objects onto 00640 the AlignSlice. $diaplay_slize, $display_type and $window_size 00641 are passed to the fetch_all_by_GenomicAlignBlock() method. 00642 Please refer to the documentation in 00643 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor 00644 for more details. 00645 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 00646 objects. 00647 Caller : object::methodname 00648 Status : At risk 00649 00650 =cut 00651 sub _get_expanded_conservation_scores { 00652 my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_; 00653 my $y_axis_min; 00654 my $y_axis_max; 00655 00656 my $all_conservation_scores = []; 00657 foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) { 00658 $this_genomic_align_block->{restricted_aln_start} = $this_genomic_align_block->{_alignslice_from}; 00659 $this_genomic_align_block->{restricted_aln_end} = $this_genomic_align_block->{_alignslice_to}; 00660 00661 my $all_these_conservation_scores = $conservation_score_adaptor->fetch_all_by_GenomicAlignBlock( 00662 $this_genomic_align_block, 1,$this_genomic_align_block->length, $self->get_all_Slices()->[0]->length, 00663 $display_size, $display_type, $window_size); 00664 00665 #initialise y axis min and max 00666 if (!defined $y_axis_max) { 00667 $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max; 00668 $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min; 00669 } 00670 #find overall min and max 00671 if ($y_axis_min > $all_these_conservation_scores->[0]->y_axis_min) { 00672 $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min; 00673 } 00674 if ($y_axis_max < $all_these_conservation_scores->[0]->y_axis_max) { 00675 $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max; 00676 } 00677 push (@$all_conservation_scores, @$all_these_conservation_scores); 00678 } 00679 #set overall min and max 00680 $all_conservation_scores->[0]->y_axis_min($y_axis_min); 00681 $all_conservation_scores->[0]->y_axis_max($y_axis_max); 00682 00683 return $all_conservation_scores; 00684 } 00685 00686 =head2 _get_condensed_conservation_scores 00687 00688 Arg 1 : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor 00689 Arg 2 : (opt) integer $display_size (default 700) 00690 Arg 3 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX") 00691 Arg 4 : (opt) integer $window_size 00692 Example : my $conservation_scores = 00693 $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10); 00694 Description: Retrieve the corresponding 00695 Bio::EnsEMBL::Compara::ConservationScore objects for the 00696 reference Bio::EnsEMBL::Slice object of 00697 this Bio::EnsEMBL::Compara::AlignSlice object. This method 00698 calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor-> 00699 fetch_all_by_MethodLinkSpeciesSet_Slice() method. It sets up 00700 the align_start, align_end and slice_length and map the 00701 resulting objects onto the AlignSlice. $display_slize, 00702 $display_type and $window_size are passed to the 00703 fetch_all_by_MethodLinkSpeciesSet_Slice() method. 00704 Please refer to the documentation in 00705 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor 00706 for more details. 00707 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore 00708 objects. 00709 Caller : object::methodname 00710 Status : At risk 00711 00712 =cut 00713 sub _get_condensed_conservation_scores { 00714 my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_; 00715 00716 my $all_conservation_scores = []; 00717 00718 throw ("Must have method_link_species_set defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{_method_link_species_set}); 00719 throw ("Must have reference slice defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{'reference_slice'}); 00720 00721 #really hacky to get the mlss for the conservation score (which then uses 00722 #it to get the multiple alignment mlss, which is what I started with! 00723 my $key = "gerp_%"; 00724 my $sql = "SELECT meta_key FROM meta WHERE meta_key like ? AND meta_value = ?"; 00725 my $sth = $conservation_score_adaptor->prepare($sql); 00726 $sth->execute($key, $self->{_method_link_species_set}->dbID); 00727 00728 my $cs_mlss_id; 00729 while (my $arrRef = $sth->fetchrow_arrayref) { 00730 ($cs_mlss_id) = $arrRef->[0] =~ /gerp_(\d+)/; 00731 } 00732 $sth->finish; 00733 00734 throw ("Unable to find conservation score method_link_species_set for this multiple alignment " . $self->{_method_link_species_set}->dbID) if (!defined $cs_mlss_id); 00735 my $mlss_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor(); 00736 00737 my $cs_mlss = $mlss_adaptor->fetch_by_dbID($cs_mlss_id); 00738 00739 $all_conservation_scores = $conservation_score_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($cs_mlss, $self->{'reference_slice'}, $display_size, $display_type, $window_size); 00740 00741 return $all_conservation_scores; 00742 } 00743 00744 00745 =head2 get_all_ConstrainedElements 00746 00747 Arg 1 : (opt) string $method_link_type (default = GERP_CONSTRAINED_ELEMENT) 00748 Arg 2 : (opt) listref Bio::EnsEMBL::Compara::GenomeDB $species_set 00749 (default, the set of species from the MethodLinkSpeciesSet used 00750 to build this AlignSlice) 00751 Example : my $constrained_elements = 00752 $align_slice->get_all_ConstrainedElements(); 00753 Description: Retrieve the corresponding constrained elements for these alignments. 00754 Objects will be located on this AlignSlice, i.e. the 00755 reference_slice, reference_slice_start, reference_slice_end 00756 and reference_slice_strand will refer to this AlignSlice 00757 object 00758 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConstrainedElement 00759 objects. 00760 Caller : object::methodname 00761 Status : At risk 00762 00763 =cut 00764 00765 sub get_all_ConstrainedElements { 00766 my ($self, $method_link_type, $species_set) = @_; 00767 my $all_constrained_elements = []; 00768 00769 $method_link_type ||= "GERP_CONSTRAINED_ELEMENT"; 00770 my $key_cache = "_constrained_elements_".$method_link_type; 00771 if ($species_set) { 00772 $key_cache .= "::" . join("-", sort map {s/\W/_/g} map {$_->name} @$species_set); 00773 } else { 00774 $species_set = $self->{_method_link_species_set}->species_set; 00775 } 00776 00777 if (!defined($self->{$key_cache})) { 00778 my $method_link_species_set_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor(); 00779 my $method_link_species_set = $method_link_species_set_adaptor->fetch_by_method_link_type_GenomeDBs( 00780 $method_link_type, $self->{_method_link_species_set}->species_set); 00781 00782 if ($method_link_species_set) { 00783 my $constrained_element_adaptor = $self->adaptor->db->get_ConstrainedElementAdaptor(); 00784 $all_constrained_elements = $constrained_element_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice( 00785 $method_link_species_set, $self->reference_Slice); 00786 my $big_mapper = $self->{_reference_Mapper}; 00787 foreach my $this_constrained_element (@{$all_constrained_elements}) { 00788 my $reference_slice_start; 00789 my $reference_slice_end; 00790 my $reference_slice_strand; 00791 00792 my @alignment_coords = $big_mapper->map_coordinates( 00793 "sequence", # $self->genomic_align->dbID, 00794 $this_constrained_element->start + $this_constrained_element->slice->start - 1, 00795 $this_constrained_element->end + $this_constrained_element->slice->start - 1, 00796 $this_constrained_element->strand, 00797 "sequence" # $from_mapper->from 00798 ); 00799 foreach my $alignment_coord (@alignment_coords) { 00800 next if (!$alignment_coord->isa("Bio::EnsEMBL::Mapper::Coordinate")); 00801 if (!defined($reference_slice_strand)) { 00802 $reference_slice_start = $alignment_coord->start; 00803 $reference_slice_end = $alignment_coord->end; 00804 $reference_slice_strand = $alignment_coord->strand; 00805 } else { 00806 if ($alignment_coord->start < $reference_slice_start) { 00807 $reference_slice_start = $alignment_coord->start; 00808 } 00809 if ($alignment_coord->end > $reference_slice_end) { 00810 $reference_slice_end = $alignment_coord->end; 00811 } 00812 } 00813 } 00814 $this_constrained_element->slice($self); 00815 $this_constrained_element->start($reference_slice_start); 00816 $this_constrained_element->end($reference_slice_end); 00817 $this_constrained_element->strand($reference_slice_strand); 00818 } 00819 } 00820 $self->{$key_cache} = $all_constrained_elements; 00821 } 00822 00823 return $self->{$key_cache}; 00824 } 00825 00826 00827 =head2 _create_underlying_Slices (experimental) 00828 00829 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks 00830 $genomic_align_blocks 00831 Arg[2] : [optional] boolean $expanded (default = FALSE) 00832 Arg[3] : [optional] boolean $solve_overlapping (default = FALSE) 00833 Arg[4] : [optional] boolean $preserve_blocks (default = FALSE) 00834 Example : 00835 Description: Creates a set of Bio::EnsEMBL::Compara::AlignSlice::Slices 00836 and attach it to this object. 00837 Returntype : 00838 Exceptions : warns about overlapping GenomicAlignBlocks 00839 Caller : 00840 00841 =cut 00842 00843 sub _create_underlying_Slices { 00844 my ($self, $genomic_align_blocks, $expanded, $solve_overlapping, $preserve_blocks, $species_order) = @_; 00845 my $strand = $self->reference_Slice->strand; 00846 00847 my $align_slice_length = 0; 00848 my $last_ref_pos; 00849 if ($strand == 1) { 00850 $last_ref_pos = $self->reference_Slice->start; 00851 } else { 00852 $last_ref_pos = $self->reference_Slice->end; 00853 } 00854 00855 my $ref_genome_db = $self->adaptor->db->get_GenomeDBAdaptor->fetch_by_Slice($self->reference_Slice); 00856 my $big_mapper = Bio::EnsEMBL::Mapper->new("sequence", "alignment"); 00857 00858 my $sorted_genomic_align_blocks; 00859 00860 if ($solve_overlapping eq "restrict") { 00861 $sorted_genomic_align_blocks = _sort_and_restrict_GenomicAlignBlocks($genomic_align_blocks); 00862 } elsif ($solve_overlapping) { 00863 $sorted_genomic_align_blocks = _sort_and_compile_GenomicAlignBlocks($genomic_align_blocks); 00864 } else { 00865 $sorted_genomic_align_blocks = _sort_GenomicAlignBlocks($genomic_align_blocks); 00866 } 00867 @$sorted_genomic_align_blocks = reverse(@$sorted_genomic_align_blocks) if ($strand == -1); 00868 foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) { 00869 my $original_genomic_align_block = $this_genomic_align_block; 00870 my ($from, $to); 00871 00872 if ($preserve_blocks) { 00873 ## Don't restrict the block. Set from and to to 1 and length respectively 00874 $from = 1; 00875 $to = $this_genomic_align_block->length; 00876 } else { 00877 #need to check that the block is still overlapping the slice - it may 00878 #have already been restricted by the options above. 00879 if ($this_genomic_align_block->reference_genomic_align->dnafrag_start > $self->reference_Slice->end || $this_genomic_align_block->reference_genomic_align->dnafrag_end < $self->reference_Slice->start) { 00880 next; 00881 } 00882 ($this_genomic_align_block, $from, $to) = $this_genomic_align_block->restrict_between_reference_positions( 00883 $self->reference_Slice->start, $self->reference_Slice->end); 00884 } 00885 00886 $original_genomic_align_block->{_alignslice_from} = $from; 00887 $original_genomic_align_block->{_alignslice_to} = $to; 00888 00889 my $reference_genomic_align = $this_genomic_align_block->reference_genomic_align; 00890 00891 #If I haven't needed to restrict, I don't gain this link so add it here 00892 if (!defined $reference_genomic_align->genomic_align_block->reference_genomic_align) { 00893 $reference_genomic_align->genomic_align_block($this_genomic_align_block); 00894 } 00895 00896 my ($this_pos, $this_gap_between_genomic_align_blocks); 00897 if ($strand == 1) { 00898 $this_pos = $reference_genomic_align->dnafrag_start; 00899 $this_gap_between_genomic_align_blocks = $this_pos - $last_ref_pos; 00900 } else { 00901 $this_pos = $reference_genomic_align->dnafrag_end; 00902 $this_gap_between_genomic_align_blocks = $last_ref_pos - $this_pos; 00903 } 00904 00905 if ($this_gap_between_genomic_align_blocks > 0) { 00906 ## Add mapper info for inter-genomic_align_block space 00907 if ($strand == 1) { 00908 $big_mapper->add_map_coordinates( 00909 'sequence', 00910 $last_ref_pos, 00911 $this_pos - 1, 00912 $strand, 00913 'alignment', 00914 $align_slice_length + 1, 00915 $align_slice_length + $this_gap_between_genomic_align_blocks, 00916 ); 00917 } else { 00918 $big_mapper->add_map_coordinates( 00919 'sequence', 00920 $this_pos + 1, 00921 $last_ref_pos, 00922 $strand, 00923 'alignment', 00924 $align_slice_length + 1, 00925 $align_slice_length + $this_gap_between_genomic_align_blocks, 00926 ); 00927 } 00928 $align_slice_length += $this_gap_between_genomic_align_blocks; 00929 } 00930 $reference_genomic_align->genomic_align_block->start($align_slice_length + 1); 00931 $original_genomic_align_block->{_alignslice_start} = $align_slice_length; 00932 if ($expanded) { 00933 $align_slice_length += CORE::length($reference_genomic_align->aligned_sequence("+FAKE_SEQ")); 00934 $reference_genomic_align->genomic_align_block->end($align_slice_length); 00935 $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0)); 00936 } else { 00937 $align_slice_length += $reference_genomic_align->dnafrag_end - $reference_genomic_align->dnafrag_start + 1; 00938 $reference_genomic_align->genomic_align_block->end($align_slice_length); 00939 $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0,1)); 00940 } 00941 $reference_genomic_align->genomic_align_block->slice($self); 00942 00943 if ($strand == 1) { 00944 $last_ref_pos = $reference_genomic_align->dnafrag_end + 1; 00945 } else { 00946 $last_ref_pos = $reference_genomic_align->dnafrag_start - 1; 00947 } 00948 } 00949 my ($this_pos, $this_gap_between_genomic_align_blocks); 00950 if ($strand == 1) { 00951 $this_pos = $self->reference_Slice->end; 00952 $this_gap_between_genomic_align_blocks = $this_pos - ($last_ref_pos - 1); 00953 } else { 00954 $this_pos = $self->reference_Slice->start; 00955 $this_gap_between_genomic_align_blocks = $last_ref_pos + 1 - $this_pos; 00956 } 00957 00958 ## $last_ref_pos is the next nucleotide position after the last mapped one. 00959 if ($this_gap_between_genomic_align_blocks > 0) { 00960 if ($strand == 1) { 00961 $big_mapper->add_map_coordinates( 00962 'sequence', 00963 $last_ref_pos, 00964 $this_pos, 00965 $strand, 00966 'alignment', 00967 $align_slice_length + 1, 00968 $align_slice_length + $this_gap_between_genomic_align_blocks, 00969 ); 00970 } else { 00971 $big_mapper->add_map_coordinates( 00972 'sequence', 00973 $this_pos, 00974 $last_ref_pos, 00975 $strand, 00976 'alignment', 00977 $align_slice_length + 1, 00978 $align_slice_length + $this_gap_between_genomic_align_blocks, 00979 ); 00980 } 00981 $align_slice_length += $this_gap_between_genomic_align_blocks; 00982 } 00983 00984 if ($species_order) { 00985 foreach my $species_def (@$species_order) { 00986 my $genome_db_name = $species_def->{genome_db}->name; 00987 # print STDERR "SPECIES:: ", $genome_db_name, "\n"; 00988 my $new_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice( 00989 -length => $align_slice_length, 00990 -requesting_slice => $self->reference_Slice, 00991 -align_slice => $self, 00992 -method_link_species_set => $self->{_method_link_species_set}, 00993 -genome_db => $species_def->{genome_db}, 00994 -expanded => $expanded, 00995 ); 00996 $new_slice->{genomic_align_ids} = $species_def->{genomic_align_ids}; 00997 push(@{$self->{slices}->{lc($genome_db_name)}}, $new_slice); 00998 push(@{$self->{_slices}}, $new_slice); 00999 } 01000 } else { 01001 # print STDERR "SPECIES:: ", $ref_genome_db->name, "\n"; 01002 $self->{slices}->{lc($ref_genome_db->name)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice( 01003 -length => $align_slice_length, 01004 -requesting_slice => $self->reference_Slice, 01005 -align_slice => $self, 01006 -method_link_species_set => $self->{_method_link_species_set}, 01007 -genome_db => $ref_genome_db, 01008 -expanded => $expanded, 01009 )]; 01010 $self->{_slices} = [$self->{slices}->{lc($ref_genome_db->name)}->[0]]; 01011 } 01012 01013 $self->{slices}->{lc($ref_genome_db->name)}->[0]->add_Slice_Mapper_pair( 01014 $self->reference_Slice, 01015 $big_mapper, 01016 1, 01017 $align_slice_length, 01018 $self->reference_Slice->strand 01019 ); 01020 $self->{_reference_Mapper} = $big_mapper; 01021 01022 foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) { 01023 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) { 01024 ## For trees, loop through all nodes (internal and leaves) to add the GenomicAligns 01025 foreach my $this_genomic_align_node (@{$this_genomic_align_block->get_all_nodes}) { 01026 # but we have to skip the reference node as this has already been added to the guide Slice 01027 next if ($this_genomic_align_node eq $this_genomic_align_block->reference_genomic_align_node); 01028 01029 # For composite segments (2X genomes), the node will link to several GenomicAligns. 01030 # Add each of them to one of the AS:Slice objects 01031 foreach my $this_genomic_align (@{$this_genomic_align_node->get_all_genomic_aligns_for_node}) { 01032 # Link to genomic_align_block may have been lost during tree minimization 01033 $this_genomic_align->genomic_align_block_id(0); 01034 $this_genomic_align->genomic_align_block($this_genomic_align_block); 01035 $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block, 01036 $species_order, $align_slice_length); 01037 } 01038 } 01039 } else { 01040 ## For plain alignments, just use all non-reference GenomicAlign objects 01041 foreach my $this_genomic_align 01042 (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) { 01043 $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block, 01044 $species_order, $align_slice_length); 01045 } 01046 } 01047 } 01048 #It is possible for the same region in the ref species (eg Gibbon) to align to 2 different blocks (pairwise to human in the case of the EPO low coverage alignments). In this case, although the incoming $genomic_align_blocks will have 2 blocks, the $sorted_genomic_align_blocks will only contain 1 of the blocks. It may happen that one species occurs in one block (eg gorilla) but not in the other. However, the $species_order will contain gorilla but the $sorted_genomic_align_blocks may not. This results in a slice being created for gorilla but it has no slice_mapper_pairs. Must check the slices and remove any that have {'slice_mapper_pairs'} as undef (no alignment comes out as a GAP). 01049 01050 if ($species_order) { 01051 my $slices = $self->{_slices}; 01052 for (my $i = (@$slices-1); $i >= 0; --$i) { 01053 if (@{$slices->[$i]->{'slice_mapper_pairs'}} == 0) { 01054 #remove from {slices} 01055 delete $self->{slices}->{$slices->[$i]->genome_db->name}; 01056 #remove from {_slices} 01057 splice @$slices, $i, 1; 01058 } 01059 } 01060 } 01061 01062 01063 01064 return $self; 01065 } 01066 01067 =head2 _add_GenomicAlign_to_a_Slice 01068 01069 =cut 01070 01071 sub _add_GenomicAlign_to_a_Slice { 01072 my ($self, $this_genomic_align, $this_genomic_align_block, $species_order, $align_slice_length) = @_; 01073 01074 my $expanded = $self->{expanded}; 01075 my $species = $this_genomic_align->dnafrag->genome_db->name; 01076 01077 if (!defined($self->{slices}->{lc($species)})) { 01078 $self->{slices}->{lc($species)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice( 01079 -length => $align_slice_length, 01080 -requesting_slice => $self->reference_Slice, 01081 -align_slice => $self, 01082 -method_link_species_set => $self->{_method_link_species_set}, 01083 -genome_db => $this_genomic_align->dnafrag->genome_db, 01084 -expanded => $expanded, 01085 )]; 01086 push(@{$self->{_slices}}, $self->{slices}->{lc($species)}->[0]); 01087 } 01088 01089 my $this_block_start = $this_genomic_align_block->start; 01090 my $this_block_end = $this_genomic_align_block->end; 01091 my $this_core_slice = $this_genomic_align->get_Slice(); 01092 if (!$this_core_slice) { 01093 $this_core_slice = new Bio::EnsEMBL::Slice( 01094 -coord_system => $aligngap_coord_system, 01095 -seq_region_name => "GAP", 01096 -start => $this_block_start, 01097 -end => $this_block_end, 01098 -strand => 0 01099 ); 01100 $this_core_slice->{seq} = "." x ($this_block_end - $this_block_start + 1); 01101 } 01102 return if (!$this_core_slice); ## The restriction of the GenomicAlignBlock may return a void GenomicAlign 01103 01104 ## This creates a link between the slice and the tree node. This is required to display 01105 ## the tree on the web interface. 01106 if ($this_genomic_align->genome_db->name eq "ancestral_sequences") { 01107 foreach my $genomic_align_node (@{$this_genomic_align_block->get_all_sorted_genomic_align_nodes}) { 01108 my $genomic_align_group = $genomic_align_node->genomic_align_group; 01109 next if (!$genomic_align_group); 01110 01111 foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) { 01112 if ($this_genomic_align == $genomic_align) { 01113 my $simple_tree = $genomic_align_node->newick_simple_format(); 01114 $simple_tree =~ s/\_[^\_]+\_\d+\_\d+\[[\+\-]\]//g; 01115 $simple_tree =~ s/\:[\d\.]+//g; 01116 $this_core_slice->{_tree} = $simple_tree; 01117 last; 01118 } 01119 } 01120 } 01121 } 01122 01123 my $this_mapper = $this_genomic_align->get_Mapper(0, !$expanded); 01124 # Fix block start and block end for composite segments (2X genomes) 01125 if ($this_genomic_align->cigar_line =~ /^(\d*)X/ or $this_genomic_align->cigar_line =~ /(\d*)X$/) { 01126 $this_block_start = undef; 01127 $this_block_end = undef; 01128 my @blocks = $this_mapper->map_coordinates("sequence", $this_genomic_align->dnafrag_start, 01129 $this_genomic_align->dnafrag_end, $this_genomic_align->dnafrag_strand, "sequence"); 01130 foreach my $this_block (@blocks) { 01131 next if ($this_block->isa("Bio::EnsEMBL::Mapper::Gap")); 01132 $this_block_start = $this_block->start if (!defined($this_block_start) or $this_block->start < $this_block_start); 01133 $this_block_end = $this_block->end if (!defined($this_block_end) or $this_block->end > $this_block_end); 01134 } 01135 } 01136 01137 # Choose the appropriate AS::Slice for adding this bit of the alignment 01138 my $this_underlying_slice = $self->_choose_underlying_Slice($this_genomic_align, $this_block_start, 01139 $this_block_end, $align_slice_length, $species_order); 01140 01141 # Add a Slice, Mapper, and start-end-strand coordinates to an underlying AS::Slice 01142 $this_underlying_slice->add_Slice_Mapper_pair( 01143 $this_core_slice, 01144 $this_mapper, 01145 $this_block_start, 01146 $this_block_end, 01147 $this_genomic_align->dnafrag_strand 01148 ); 01149 return; 01150 } 01151 01152 01153 sub _choose_underlying_Slice { 01154 my ($self, $this_genomic_align, $this_block_start, $this_block_end, $align_slice_length, $species_order) = @_; 01155 my $underlying_slice = undef; 01156 01157 my $expanded = $self->{expanded}; 01158 my $species = $this_genomic_align->dnafrag->genome_db->name; 01159 01160 if (defined($this_genomic_align->{_temporary_AS_underlying_Slice})) { 01161 my $preset_underlying_slice = $this_genomic_align->{_temporary_AS_underlying_Slice}; 01162 delete($this_genomic_align->{_temporary_AS_underlying_Slice}); 01163 return $preset_underlying_slice; 01164 } 01165 01166 if (!defined($self->{slices}->{lc($species)})) { 01167 ## No slice for this species yet. Create, store and return it 01168 $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice( 01169 -length => $align_slice_length, 01170 -requesting_slice => $self->reference_Slice, 01171 -align_slice => $self, 01172 -method_link_species_set => $self->{_method_link_species_set}, 01173 -genome_db => $this_genomic_align->dnafrag->genome_db, 01174 -expanded => $expanded, 01175 ); 01176 push(@{$self->{_slices}}, $underlying_slice); 01177 push(@{$self->{slices}->{lc($species)}}, $underlying_slice); 01178 return $underlying_slice; 01179 } 01180 01181 if ($species_order) { 01182 my $preset_underlying_slice = undef; 01183 foreach my $this_underlying_slice (@{$self->{_slices}}) { 01184 if (!$this_genomic_align->{original_dbID} and $this_genomic_align->dbID) { 01185 $this_genomic_align->{original_dbID} = $this_genomic_align->dbID; 01186 } 01187 if (grep {$_ == $this_genomic_align->{original_dbID}} 01188 @{$this_underlying_slice->{genomic_align_ids}}) { 01189 $preset_underlying_slice = $this_underlying_slice; 01190 } 01191 } 01192 if ($preset_underlying_slice) { 01193 my $overlap = 0; 01194 my $slice_mapper_pairs = $preset_underlying_slice->get_all_Slice_Mapper_pairs(); 01195 foreach my $slice_mapper_pair (@$slice_mapper_pairs) { 01196 my $block_start = $slice_mapper_pair->{start}; 01197 my $block_end = $slice_mapper_pair->{end}; 01198 #a block may not have a start and end if there is no sequence 01199 #eg the cigar_line looks like 139D17186X 01200 next if (!defined $this_block_start || !defined $this_block_end); 01201 if ($this_block_start <= $block_end and $this_block_end >= $block_start) { 01202 $overlap = 1; 01203 last; 01204 } 01205 } 01206 if (!$overlap) { 01207 ## This block does not overlap any previous block: add it! 01208 $underlying_slice = $preset_underlying_slice; 01209 } 01210 } 01211 } 01212 01213 if (!$underlying_slice) { 01214 ## Try to add this alignment to an existing underlying Bio::EnsEMBL::Compara::AlignSlice::Slice 01215 SLICE: foreach my $this_underlying_slice (@{$self->{slices}->{lc($species)}}) { 01216 my $slice_mapper_pairs = $this_underlying_slice->get_all_Slice_Mapper_pairs(); 01217 PAIRS: foreach my $slice_mapper_pair (@$slice_mapper_pairs) { 01218 my $block_start = $slice_mapper_pair->{start}; 01219 my $block_end = $slice_mapper_pair->{end}; 01220 if ($this_block_start <= $block_end and $this_block_end >= $block_start) { 01221 next SLICE; ## This block overlaps a previous block 01222 } 01223 } 01224 ## This block does not overlap any previous block: add it! 01225 $underlying_slice = $this_underlying_slice; 01226 } 01227 } 01228 01229 if (!$underlying_slice) { 01230 ## This block overlaps at least one block in every available underlying 01231 ## Bio::EnsEMBL::Compara::AlignSlice::Slice. Create a new one! 01232 $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice( 01233 -length => $align_slice_length, 01234 -requesting_slice => $self->reference_Slice, 01235 -align_slice => $self, 01236 -method_link_species_set => $self->{_method_link_species_set}, 01237 -genome_db => $this_genomic_align->dnafrag->genome_db, 01238 -expanded => $expanded, 01239 ); 01240 push(@{$self->{_slices}}, $underlying_slice); 01241 push(@{$self->{slices}->{lc($species)}}, $underlying_slice); 01242 } 01243 01244 # if ($this_genomic_align->cigar_line =~ /X/) { 01245 # ## This GenomicAlign is part of a composite alignment 01246 # my $genomic_align_group = $this_genomic_align->genomic_align_group_by_type("composite"); 01247 # foreach my $this_genomic_align (@{$genomic_align_group->genomic_align_array}) { 01248 # # next if ($this_genomic_align 01249 # } 01250 # } 01251 01252 return $underlying_slice; 01253 } 01254 01255 01256 =head2 _sort_and_restrict_GenomicAlignBlocks 01257 01258 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs 01259 Example : $sorted_gabs = _sort_GenomicAlignBlocks($gabs); 01260 Description : This method returns the original list of 01261 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order 01262 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects 01263 Exceptions : 01264 Caller : methodname() 01265 01266 =cut 01267 01268 sub _sort_and_restrict_GenomicAlignBlocks { 01269 my ($genomic_align_blocks) = @_; 01270 my $sorted_genomic_align_blocks = []; 01271 return $sorted_genomic_align_blocks if (!$genomic_align_blocks); 01272 01273 my $last_end; 01274 foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) { 01275 if (defined($last_end) and 01276 $this_genomic_align_block->reference_genomic_align->dnafrag_start <= $last_end) { 01277 if ($this_genomic_align_block->reference_genomic_align->dnafrag_end > $last_end) { 01278 $this_genomic_align_block = $this_genomic_align_block->restrict_between_reference_positions($last_end + 1, undef); 01279 } else { 01280 warning("Ignoring GenomicAlignBlock because it overlaps". 01281 " previous GenomicAlignBlock"); 01282 next; 01283 } 01284 } 01285 $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end; 01286 push(@$sorted_genomic_align_blocks, $this_genomic_align_block); 01287 } 01288 01289 return $sorted_genomic_align_blocks; 01290 } 01291 01292 =head2 _sort_GenomicAlignBlocks 01293 01294 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs 01295 Example : $sorted_gabs = _sort_GenomicAlignBlocks($gabs); 01296 Description : This method returns the original list of 01297 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order 01298 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects 01299 Exceptions : 01300 Caller : methodname() 01301 01302 =cut 01303 01304 sub _sort_GenomicAlignBlocks { 01305 my ($genomic_align_blocks) = @_; 01306 my $sorted_genomic_align_blocks = []; 01307 return $sorted_genomic_align_blocks if (!$genomic_align_blocks); 01308 01309 my $last_end; 01310 foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) { 01311 if (!defined($last_end) or 01312 $this_genomic_align_block->reference_genomic_align->dnafrag_start > $last_end) { 01313 push(@$sorted_genomic_align_blocks, $this_genomic_align_block); 01314 $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end; 01315 } else { 01316 my $block_id; 01317 if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) { 01318 warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignBlock #". 01319 ($this_genomic_align_block->dbID or "-unknown")." because it overlaps". 01320 " previous Bio::EnsEMBL::Compara::GenomicAlignBlock"); 01321 } else { 01322 warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignTree #". 01323 ($this_genomic_align_block->node_id or "-unknown")." because it overlaps". 01324 " previous Bio::EnsEMBL::Compara::GenomicAlignTree"); 01325 } 01326 01327 } 01328 } 01329 01330 return $sorted_genomic_align_blocks; 01331 } 01332 01333 sub _sort_gabs { 01334 01335 if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) { 01336 _sort_genomic_align_block(); 01337 } else { 01338 _sort_genomic_align_tree(); 01339 } 01340 } 01341 01342 sub _sort_genomic_align_block { 01343 01344 if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) { 01345 ## This may happen when a block has been split into small pieces and some of them contain 01346 ## gaps only for the reference species. In this case, use another species for sorting these 01347 ## genomic_align_blocks 01348 for (my $i = 0; $i<@{$a->get_all_non_reference_genomic_aligns()}; $i++) { 01349 for (my $j = 0; $j<@{$b->get_all_non_reference_genomic_aligns()}; $j++) { 01350 next if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_id != 01351 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_id); 01352 if (($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start != 01353 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start) and 01354 ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand == 01355 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_strand)) { 01356 ## This other genomic_align is not a full gap and ca be used to sort these blocks 01357 if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand == 1) { 01358 return $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start <=> 01359 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start; 01360 } else { 01361 return $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start <=> 01362 $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start; 01363 } 01364 } 01365 } 01366 } 01367 } else { 01368 return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start 01369 } 01370 } 01371 01372 sub _sort_genomic_align_tree { 01373 01374 if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) { 01375 ## This may happen when a block has been split into small pieces and some of them contain 01376 ## gaps only for the reference species. In this case, use another species for sorting these 01377 ## genomic_align_blocks 01378 my $a_leaves = $a->get_all_leaves; 01379 my $b_leaves = $b->get_all_leaves; 01380 01381 for (my $i = 0; $i < @$a_leaves; $i++) { 01382 for (my $j = 0; $j < @$b_leaves; $j++) { 01383 #look at high coverage sequences only 01384 my $a_gas = $a_leaves->[$i]->get_all_genomic_aligns_for_node; 01385 next if (@$a_gas > 1); 01386 my $a_ga = $a_gas->[0]; 01387 01388 my $b_gas = $b_leaves->[$j]->get_all_genomic_aligns_for_node; 01389 next if (@$b_gas > 1); 01390 my $b_ga = $b_gas->[0]; 01391 01392 next if ($a_ga->dnafrag_id != $b_ga->dnafrag_id); 01393 if (($a_ga->dnafrag_start != $b_ga->dnafrag_start) and ($a_ga->dnafrag_strand == $b_ga->dnafrag_strand)) { 01394 ## This other genomic_align is not a full gap and ca be used to sort these blocks 01395 if ($a_ga->dnafrag_strand == 1) { 01396 return $a_ga->dnafrag_start <=> $b_ga->dnafrag_start; 01397 } else { 01398 return $b_ga->dnafrag_start <=> $a_ga->dnafrag_start; 01399 } 01400 } 01401 } 01402 } 01403 } else { 01404 return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start 01405 } 01406 } 01407 01408 =head2 _sort_and_compile_GenomicAlignBlocks 01409 01410 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs 01411 Example : $sorted_fake_gabs = _sort_and_compile_GenomicAlignBlocks($gabs); 01412 Description : This method returns a list of 01413 Bio::EnsEMBL::Compara::GenomicAlignBlock objects sorted by 01414 position on the reference Bio::EnsEMBL::Compara::DnaFrag. If two 01415 or more Bio::EnsEMBL::Compara::GenomicAlignBlock objects 01416 overlap, it compile them, using the _compile_GenomicAlignBlocks 01417 method. 01418 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects 01419 Exceptions : 01420 Caller : methodname() 01421 01422 =cut 01423 01424 sub _sort_and_compile_GenomicAlignBlocks { 01425 my ($genomic_align_blocks) = @_; 01426 my $sorted_genomic_align_blocks = []; 01427 return $sorted_genomic_align_blocks if (!$genomic_align_blocks); 01428 01429 ############################################################################################## 01430 ## 01431 ## Compile GenomicAlignBlocks in group of GenomicAlignBlocks based on reference coordinates 01432 ## 01433 my $sets_of_genomic_align_blocks = []; 01434 my $start_pos; 01435 my $end_pos; 01436 my $this_set_of_genomic_align_blocks = []; 01437 foreach my $this_genomic_align_block (sort _sort_gabs @$genomic_align_blocks) { 01438 my $this_start_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_start; 01439 my $this_end_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_end; 01440 if (defined($end_pos) and ($this_start_pos <= $end_pos)) { 01441 # this genomic_align_block overlaps previous one. Extend this set_of_coordinates 01442 $end_pos = $this_end_pos if ($this_end_pos > $end_pos); 01443 } else { 01444 # there is a gap between this genomic_align_block and the previous one. Close and save 01445 # this set_of_genomic_align_blocks (if it exists) and start a new one. 01446 push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks]) 01447 if (defined(@$this_set_of_genomic_align_blocks)); 01448 $start_pos = $this_start_pos; 01449 $end_pos = $this_end_pos; 01450 $this_set_of_genomic_align_blocks = []; 01451 } 01452 push(@$this_set_of_genomic_align_blocks, $this_genomic_align_block); 01453 } 01454 push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks]) 01455 if (defined(@$this_set_of_genomic_align_blocks)); 01456 ## 01457 ############################################################################################## 01458 01459 foreach my $this_set_of_genomic_align_blocks (@$sets_of_genomic_align_blocks) { 01460 my $this_compiled_genomic_align_block; 01461 if (@$this_set_of_genomic_align_blocks == 1) { 01462 $this_compiled_genomic_align_block = $this_set_of_genomic_align_blocks->[0]; 01463 } else { 01464 $this_compiled_genomic_align_block = 01465 _compile_GenomicAlignBlocks(@$this_set_of_genomic_align_blocks); 01466 } 01467 push(@{$sorted_genomic_align_blocks}, $this_compiled_genomic_align_block); 01468 } 01469 01470 return $sorted_genomic_align_blocks; 01471 } 01472 01473 =head2 _compile_GenomicAlignBlocks 01474 01475 Arg [1] : integer $start_pos (the start of the fake genomic_align) 01476 Arg [2] : integer $end_pos (the end of the fake genomic_align) 01477 Arg [3] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $set_of_genomic_align_blocks 01478 $all_genomic_align_blocks (the pairwise genomic_align_blocks used for 01479 this fake multiple genomic_aling_block) 01480 Example : 01481 Description : 01482 Returntype : Bio::EnsEMBL::Compara::GenomicAlignBlock object 01483 Exceptions : 01484 Caller : methodname 01485 01486 =cut 01487 01488 sub _compile_GenomicAlignBlocks { 01489 my ($start_pos, $end_pos, $all_genomic_align_blocks) = @_; 01490 01491 ############################################################################################ 01492 ## 01493 ## Change strands in order to have all reference genomic aligns on the forward strand 01494 ## 01495 my $strand; 01496 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01497 my $this_genomic_align = $this_genomic_align_block->reference_genomic_align; 01498 if (!defined($strand)) { 01499 $strand = $this_genomic_align->dnafrag_strand; 01500 } elsif ($strand != $this_genomic_align->dnafrag_strand) { 01501 $strand = 0; 01502 } 01503 if ($this_genomic_align->dnafrag_strand == -1) { 01504 01505 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) { 01506 foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) { 01507 my $genomic_align_group = $this_node->genomic_align_group; 01508 next if (!$genomic_align_group); 01509 foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) { 01510 $genomic_align->reverse_complement; 01511 } 01512 } 01513 } else { 01514 foreach my $genomic_align (@{$this_genomic_align_block->genomic_align_array}) { 01515 $genomic_align->reverse_complement; 01516 } 01517 } 01518 } 01519 } 01520 ## 01521 ############################################################################################ 01522 01523 ## Nothing has to be compiled if there is one single GenomicAlignBlock! 01524 $all_genomic_align_blocks->[0]->reverse_complement; 01525 return $all_genomic_align_blocks->[0] if (scalar(@$all_genomic_align_blocks) == 1); 01526 01527 ############################################################################################ 01528 ## 01529 ## Fix all sequences 01530 ## 01531 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01532 my $this_genomic_align = $this_genomic_align_block->reference_genomic_align; 01533 my $this_start_pos = $this_genomic_align->dnafrag_start; 01534 my $this_end_pos = $this_genomic_align->dnafrag_end; 01535 my $starting_gap = $this_start_pos - $start_pos; 01536 my $ending_gap = $end_pos - $this_end_pos; 01537 01538 my $this_cigar_line = $this_genomic_align->cigar_line; 01539 my $this_original_sequence = $this_genomic_align->original_sequence; 01540 $this_genomic_align->aligned_sequence(""); 01541 if ($starting_gap) { 01542 $this_cigar_line = $starting_gap."M".$this_cigar_line; 01543 $this_original_sequence = ("N" x $starting_gap).$this_original_sequence; 01544 } 01545 if ($ending_gap) { 01546 $this_cigar_line .= $ending_gap."M"; 01547 $this_original_sequence .= ("N" x $ending_gap); 01548 } 01549 $this_genomic_align->cigar_line($this_cigar_line); 01550 $this_genomic_align->original_sequence($this_original_sequence); 01551 01552 foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) { 01553 $this_genomic_align->aligned_sequence(""); 01554 my $this_cigar_line = $this_genomic_align->cigar_line; 01555 $this_cigar_line = $starting_gap."D".$this_cigar_line if ($starting_gap); 01556 $this_cigar_line .= $ending_gap."D" if ($ending_gap); 01557 $this_genomic_align->cigar_line($this_cigar_line); 01558 $this_genomic_align->aligned_sequence(); # compute aligned_sequence using cigar_line 01559 } 01560 } 01561 ## 01562 ############################################################################################ 01563 01564 ############################################################################################ 01565 ## 01566 ## Distribute gaps 01567 ## 01568 my $aln_pos = 0; 01569 my $gap; 01570 do { 01571 my $gap_pos; 01572 my $genomic_align_block; 01573 $gap = undef; 01574 01575 ## Get the (next) first gap from all the alignments (sets: $gap_pos, $gap and $genomic_align_block_id) 01576 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01577 my $this_gap_pos = index($this_genomic_align_block->reference_genomic_align->aligned_sequence, 01578 "-", $aln_pos); 01579 if ($this_gap_pos > 0 and (!defined($gap_pos) or $this_gap_pos < $gap_pos)) { 01580 $gap_pos = $this_gap_pos; 01581 my $gap_string = substr($this_genomic_align_block->reference_genomic_align->aligned_sequence, 01582 $gap_pos); 01583 ($gap) = $gap_string =~ /^(\-+)/; 01584 $genomic_align_block = $this_genomic_align_block; 01585 } 01586 } 01587 01588 ## If a gap has been found, apply it to the other GAB 01589 if ($gap) { 01590 $aln_pos = $gap_pos + length($gap); 01591 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01592 next if ($genomic_align_block eq $this_genomic_align_block); # Do not add gap to itself!! 01593 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) { 01594 foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) { 01595 my $genomic_align_group = $this_node->genomic_align_group; 01596 next if (!$genomic_align_group); 01597 foreach my $this_genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) { 01598 # insert gap in the aligned_sequence 01599 my $aligned_sequence = $this_genomic_align->aligned_sequence; 01600 substr($aligned_sequence, $gap_pos, 0, $gap); 01601 $this_genomic_align->aligned_sequence($aligned_sequence); 01602 } 01603 } 01604 } else { 01605 foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) { 01606 # insert gap in the aligned_sequence 01607 my $aligned_sequence = $this_genomic_align->aligned_sequence; 01608 substr($aligned_sequence, $gap_pos, 0, $gap); 01609 $this_genomic_align->aligned_sequence($aligned_sequence); 01610 } 01611 } 01612 } 01613 } 01614 01615 } while ($gap); # exit loop if no gap has been found 01616 01617 ## Fix all cigar_lines in order to match new aligned_sequences 01618 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01619 foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) { 01620 $this_genomic_align->cigar_line(""); # undef old cigar_line 01621 $this_genomic_align->cigar_line(); # compute cigar_line from aligned_sequence 01622 } 01623 } 01624 ## 01625 ############################################################################################ 01626 01627 ############################################################################################ 01628 ## 01629 ## Create the reference_genomic_align for this fake genomic_align_block 01630 ## 01631 ## All the blocks have been edited and all the reference genomic_aling 01632 ## should be equivalent. Here, we create a new one with no fixed sequence. 01633 ## This permits to retrieve the real sequence when needed 01634 ## 01635 my $reference_genomic_align; 01636 if (@$all_genomic_align_blocks) { 01637 my $this_genomic_align = $all_genomic_align_blocks->[0]->reference_genomic_align; 01638 $reference_genomic_align = new Bio::EnsEMBL::Compara::GenomicAlign( 01639 -dbID => -1, 01640 -dnafrag => $this_genomic_align->dnafrag, 01641 -dnafrag_start => $start_pos, 01642 -dnafrag_end => $end_pos, 01643 -dnafrag_strand => 1, 01644 -cigar_line => $this_genomic_align->cigar_line, 01645 -method_link_species_set => $this_genomic_align->method_link_species_set, 01646 -visible => 1 01647 ); 01648 } 01649 ## 01650 ############################################################################################ 01651 01652 ## Create the genomic_align_array (the list of genomic_aling for this fake gab 01653 my $genomic_align_array = [$reference_genomic_align]; 01654 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) { 01655 foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) { 01656 $this_genomic_align->genomic_align_block_id(0); # undef old genomic_align_block_id 01657 push(@$genomic_align_array, $this_genomic_align); 01658 } 01659 } 01660 01661 ## Create the fake multiple Bio::EnsEMBL::Compara::GenomicAlignBlock 01662 my $fake_genomic_align_block = new Bio::EnsEMBL::Compara::GenomicAlignBlock( 01663 -length => ($end_pos - $start_pos + 1), 01664 -genomic_align_array => $genomic_align_array, 01665 -reference_genomic_align => $reference_genomic_align, 01666 -level_id => 0 01667 ); 01668 01669 if ($strand == -1) { 01670 $fake_genomic_align_block->reverse_complement; 01671 } 01672 01673 return $fake_genomic_align_block; 01674 } 01675 01676 01677 sub DESTROY { 01678 my $self = shift; 01679 ## Remove circular reference in order to allow Perl to clear the object 01680 $self->{all_genomic_align_blocks} = undef; 01681 } 01682 01683 1;