Archive Ensembl HomeArchive Ensembl Home
Bio::EnsEMBL::Hive::Process Class Reference
Inheritance diagram for Bio::EnsEMBL::Hive::Process:

List of all members.


Class Summary

Synopsis

  Abstract superclass.  Each Process makes up the individual building blocks 
  of the system.  Instances of these processes are created in a hive workflow 
  graph of Analysis entries that are linked together with dataflow and 
  AnalysisCtrl rules.
  
  Instances of these Processes are created by the system as work is done.
  The newly created Process will have preset $self-\>queen, $self-\>dbc, 
  $self-\>input_id, $self-\>analysis and several other variables. 
  From this input and configuration data, each Process can then proceed to 
  do something.  The flow of execution within a Process is:
    fetch_input();
    run();
    write_output();
    DESTROY
  The developer can implement their own versions of fetch_input, run, 
  write_output, and DESTROY to do what they need.  
  
  The entire system is based around the concept of a workflow graph which
  can split and loop back on itself.  This is accomplished by dataflow
  rules (or pipes) that connect one Process (or analysis) to others.
  Where a unix commandline program can send output on STDOUT STDERR pipes, 
  a hive Process has access to unlimited pipes referenced by numerical 
  branch_codes. This is accomplished within the Process via 
  $self-\>dataflow_output_id(...);  
  
  The design philosophy is that each Process does it's work and creates output, 
  but it doesn't worry about where the input came from, or where it's output 
  goes. If the system has dataflow pipes connected, then the output jobs 
  have purpose, if not the output work is thrown away.  The workflow graph 
  'controls' the behaviour of the system, not the processes.  The processes just 
  need to do their job.  The design of the workflow graph is based on the knowledge 
  of what each Process does so that the graph can be correctly constructed.
  The workflow graph can be constructed a priori or can be constructed and 
  modified by intelligent Processes as the system runs.
  
  
  The Hive is based on AI concepts and modeled on the social structure and 
  behaviour of a honey bee hive. So where a worker honey bee's purpose is
  (go find pollen, bring back to hive, drop off pollen, repeat), an ensembl-hive 
  worker's purpose is (find a job, create a Process for that job, run it,
  drop off output job(s), repeat).  While most workflow systems are based 
  on 'smart' central controllers and external control of 'dumb' processes, 
  the Hive is based on 'dumb' workflow graphs and job kiosk, and 'smart' workers 
  (autonomous agents) who are self configuring and figure out for themselves what 
  needs to be done, and then do it.  The workers are based around a set of 
  emergent behaviour rules which allow a predictible system behaviour to emerge 
  from what otherwise might appear at first glance to be a chaotic system. There 
  is an inherent asynchronous disconnect between one worker and the next.  
  Work (or jobs) are simply 'posted' on a blackboard or kiosk within the hive 
  database where other workers can find them.  
  The emergent behaviour rules of a worker are:
     1) If a job is posted, someone needs to do it.
     2) Don't grab something that someone else is working on
     3) Don't grab more than you can handle
     4) If you grab a job, it needs to be finished correctly
     5) Keep busy doing work
     6) If you fail, do the best you can to report back
  For further reading on the AI principles employed in this design see:
     http://en.wikipedia.org/wiki/Autonomous_Agent
     http://en.wikipedia.org/wiki/Emergence

Definition at line 75 of file Process.pm.

Available Methods

public Bio::EnsEMBL::Analysis analysis ()
public catch ()
public void check_if_exit_cleanly ()
public
Bio::EnsEMBL::DBSQL::DBConnection 
data_dbc ()
public dataflow_output_id ()
public
Bio::EnsEMBL::Hive::DBSQL::DBAdaptor 
db ()
public
Bio::EnsEMBL::DBSQL::DBConnection 
dbc ()
public Int debug ()
public void deprecate ()
public DESTROY ()
public fetch_input ()
public go_figure_dbc ()
public void info ()
public input_id ()
public
Bio::EnsEMBL::Hive::AnalysisJob 
input_job ()
public new ()
public Array output ()
public param ()
public param_defaults ()
public param_substitute ()
public parameters ()
public Bio::EnsEMBL::Hive::Queen queen ()
public run ()
public Arrayref runnable ()
public Array stack_trace ()
public String stack_trace_dump ()
public strict_hash_format ()
public void throw ()
public Depend try ()
public Int verbose ()
public warning ()
public worker ()
public worker_temp_directory ()
public write_output ()

Method Documentation

public Bio::EnsEMBL::Analysis Bio::EnsEMBL::Hive::Process::analysis ( )
    Title   :  analysis
    Usage   :  $self->analysis;
    Function:  Returns the Analysis object associated with this
               instance of the Process.
    Returns :  Bio::EnsEMBL::Analysis object
 
Code:
click to view
public void Bio::EnsEMBL::Hive::Process::check_if_exit_cleanly ( )
    Title   :   check_if_exit_cleanly
    Usage   :   $self->check_if_exit_cleanly()
    Function:   Check if we want to exit or kill it cleanly at the
                runnable level
    Returns :   None
    Args    :   None
 
Code:
click to view
public Bio::EnsEMBL::DBSQL::DBConnection Bio::EnsEMBL::Hive::Process::data_dbc ( )
    Title   :   data_dbc
    Usage   :   my $data_dbc = $self->data_dbc;
    Function:   returns a Bio::EnsEMBL::DBSQL::DBConnection object (the "current" one by default, but can be set up otherwise)
    Returns :   Bio::EnsEMBL::DBSQL::DBConnection
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::dataflow_output_id ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::DBSQL::DBAdaptor Bio::EnsEMBL::Hive::Process::db ( )
    Title   :   db
    Usage   :   my $hiveDBA = $self->db;
    Function:   returns DBAdaptor to Hive database
    Returns :   Bio::EnsEMBL::Hive::DBSQL::DBAdaptor
 
Code:
click to view
public Bio::EnsEMBL::DBSQL::DBConnection Bio::EnsEMBL::Hive::Process::dbc ( )
    Title   :   dbc
    Usage   :   my $hiveDBConnection = $self->dbc;
    Function:   returns DBConnection to Hive database
    Returns :   Bio::EnsEMBL::DBSQL::DBConnection
 
Code:
click to view
public Int Bio::EnsEMBL::Hive::Process::debug ( )
    Title   :  debug
    Function:  Gets/sets flag for debug level. Set through Worker/runWorker.pl
               Subclasses should treat as a read_only variable.
    Returns :  integer
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::DESTROY ( )
    Title   :  DESTROY
    Function:  sublcass can implement functions related to cleanup and release.
               Typical activities includes freeing datastructures or 
	       closing files.
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::fetch_input ( )
    Title   :  fetch_input
    Function:  sublcass can implement functions related to data fetching.
               Typical acivities would be to parse $self->input_id and read
               configuration information from $self->analysis.  Subclasses
               may also want to fetch data from databases or from files 
               within this function.
 
Code:
click to view

Reimplemented in Bio::EnsEMBL::Hive::RunnableDB::Dummy, Bio::EnsEMBL::Hive::RunnableDB::FailureTest, Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether, Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply, Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start, Bio::EnsEMBL::Hive::RunnableDB::MySQLTransfer, Bio::EnsEMBL::Hive::RunnableDB::NotifyByEmail, Bio::EnsEMBL::Hive::RunnableDB::SqlCmd, and Bio::EnsEMBL::Hive::RunnableDB::SystemCmd.

public Bio::EnsEMBL::Hive::Process::go_figure_dbc ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::Process::input_id ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::AnalysisJob Bio::EnsEMBL::Hive::Process::input_job ( )
    Title   :  input_job
    Function:  Returns the AnalysisJob to be run by this process
               Subclasses should treat this as a read_only object.          
    Returns :  Bio::EnsEMBL::Hive::AnalysisJob object
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::new ( )

Undocumented method

Code:
click to view
public Array Bio::EnsEMBL::Hive::Process::output ( )
    Title   :   output
    Usage   :   $self->output()
    Function:   
    Returns :   Array of Bio::EnsEMBL::FeaturePair
    Args    :   None
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::param ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::Process::param_defaults ( )
    Title   :  param_defaults
    Function:  sublcass can define defaults for all params used by the RunnableDB/Process
 
Code:
click to view

Reimplemented in Bio::EnsEMBL::Hive::RunnableDB::FailureTest.

public Bio::EnsEMBL::Hive::Process::param_substitute ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::Process::parameters ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::Queen Bio::EnsEMBL::Hive::Process::queen ( )
    Title   :   queen
    Usage   :   my $hiveDBA = $self->queen;
    Function:   returns the 'Queen' this Process was created by
    Returns :   Bio::EnsEMBL::Hive::Queen
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::run ( )
    Title   :  run
    Function:  sublcass can implement functions related to process execution.
               Typical activities include running external programs or running
               algorithms by calling perl methods.  Process may also choose to
               parse results into memory if an external program was used.
 
Code:
click to view

Reimplemented in Bio::EnsEMBL::Hive::RunnableDB::Dummy, Bio::EnsEMBL::Hive::RunnableDB::FailureTest, Bio::EnsEMBL::Hive::RunnableDB::JobFactory, Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether, Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply, Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start, Bio::EnsEMBL::Hive::RunnableDB::MySQLTransfer, Bio::EnsEMBL::Hive::RunnableDB::NotifyByEmail, Bio::EnsEMBL::Hive::RunnableDB::SqlCmd, and Bio::EnsEMBL::Hive::RunnableDB::SystemCmd.

public Arrayref Bio::EnsEMBL::Hive::Process::runnable ( )
    Title   :   runnable
    Usage   :   $self->runnable($arg)
    Function:   Sets a runnable for this RunnableDB
    Returns :   arrayref of Bio::EnsEMBL::Analysis::Runnable
    Args    :   Bio::EnsEMBL::Analysis::Runnable
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::strict_hash_format ( )
    Title   :  strict_hash_format
    Function:  if a subclass wants more flexibility in parsing job.input_id and analysis.parameters,
               it should redefine this method to return 0
 
Code:
click to view

Reimplemented in Bio::EnsEMBL::Hive::RunnableDB::Dummy, Bio::EnsEMBL::Hive::RunnableDB::SqlCmd, and Bio::EnsEMBL::Hive::RunnableDB::SystemCmd.

public Bio::EnsEMBL::Hive::Process::warning ( )

Undocumented method

Code:
click to view

Reimplemented from Bio::EnsEMBL::Utils::Exception.

public Bio::EnsEMBL::Hive::Process::worker ( )

Undocumented method

Code:
click to view
public Bio::EnsEMBL::Hive::Process::worker_temp_directory ( )
    Title   :  worker_temp_directory
    Function:  Returns a path to a directory on the local /tmp disk 
               which the subclass can use as temporary file space.
               This directory is made the first time the function is called.
               It persists for as long as the worker is alive.  This allows
               multiple jobs run by the worker to potentially share temp data.
               For example the worker (which is a single Analysis) might need
               to dump a datafile file which is needed by all jobs run through 
               this analysis.  The process can first check the worker_temp_directory
               for the file and dump it if it is missing.  This way the first job
               run by the worker will do the dump, but subsequent jobs can reuse the 
               file.
    Usage   :  $tmp_dir = $self->worker_temp_directory;
    Returns :  <string> path to a local (/tmp) directory
 
Code:
click to view
public Bio::EnsEMBL::Hive::Process::write_output ( )
    Title   :  write_output
    Function:  sublcass can implement functions related to storing results.
               Typical activities including writing results into database tables
               or into files on a shared filesystem.
 
Code:
click to view

Reimplemented in Bio::EnsEMBL::Hive::RunnableDB::Dummy, Bio::EnsEMBL::Hive::RunnableDB::FailureTest, Bio::EnsEMBL::Hive::RunnableDB::JobFactory, Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether, Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply, Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start, Bio::EnsEMBL::Hive::RunnableDB::MySQLTransfer, Bio::EnsEMBL::Hive::RunnableDB::NotifyByEmail, Bio::EnsEMBL::Hive::RunnableDB::SqlCmd, and Bio::EnsEMBL::Hive::RunnableDB::SystemCmd.


The documentation for this class was generated from the following file: