logo ParaLoop - The documentation

The copyright notice

paraloop is governed by the CeCILL license under French law

The ParaLoop Quick reference card

How are read switches and parameters ?

The switches and parameters are read in the order described under. If some switch or parameter is set at some step, it is NEVER SET again: thus the default values are set at the end of the process, the imposed values are set at the beginning.

  1. The switch on the command line
  2. $PARALOOP/../etc/paraloop.root.cfg
  3. f1.cfg (supposing the switch --cfg f1.cfg,f2.cfg was specified)
  4. f2.cfg
  5. $HOME/.paralooprc
  6. $PARALOOP/../etc/paraloop.cfg

Switches and parameters useful for the end user

Substitution characters

In all the parameters describing a file or a directory, you can insert some characters that will be substituted at runtime. The list of allowed characters is described here:

CharacterSubstituted value
%hThe hour part of time (11 for 11:30:05)
%mThe minute part of time (30 for 11:30:05)
%sThe seconds part of time (05 for 11:30:05)
%YThe year part of date (05 for Sept 6th 2005)
%MThe month part of date (09 for Sept 6th 2005)
%YThe day part of date (06 for Sept 6th 2005)
%pThe number of cpus (ncpus parameter)
%lThe number of cpus on the local machine, for a cluster (local_ncpus parameter)
%vThe number of cpus in master/slave mode (slave_ncpus parameter)

Files and directories

ParameterSwitchDefaultMeaning
 --cfg=f1.cfg,f2.cfg List of configuration files, the first specified is read first
PARALOOP_max_file_size 1 GbThe max output file size. If more than 1 Gb, another file is created
PARALOOP_error_directory PARALOOP_errorThe error directory
PARALOOP_lock_directory PARALOOP_lockThe lock directory

Messages and log files

ParameterSwitchDefaultMeaning
 --verboseNoDisplay more stuff to the console
 --quietNoDisplay nearly no message
PARALOOP_log_level 010=log nearly nothing
01=log normally
012=log more

Input, output

ParameterSwitchDefaultMeaning
PARALOOP_input--input The name of the input file. May be a path
May include substitution characters
PARALOOP_output--output The name of the output file. May be a path
May include substitution characters
PARALOOP_start--start0The start record number (0 means first record)
PARALOOP_end--endEnd of fileThe end record number
PARALOOP_interleaved--interleavednoDistribute the data in a round-robin algorithm

Plugins

ParameterSwitchDefaultMeaning
 --plugins Display the list of available plugins
PARALOOP_program--program The plugin to use
PARALOOP_db--db Used by some plugins (Blast)
PARALOOP_wait--wait0Do not return, wait for every child to finish

Processors and queues

ParameterSwitchDefaultMeaning
PARALOOP_ncpus--ncpusSet by the administratorThe number of cpus to use (the number of children processes to run)
 --localnoRun on the local machine, without sending the jobs to the cluster nodes
PARALOOP_fair_time_limit Set by the administratorOnly implemented with queues.
After this time has elapsed, the job is submitted again, then interrupted, letting your colleagues a chance to work.
PARALOOP_account--account Only implemented with PBS
The account, passed to the qsub utility.
PARALOOP_queue--queueSet by the administratorThe execution queue
PARALOOP_qsub_params Set by the administratorAdditional parameters passed to qsub

Load balancing

Sometimes, the work dedicated to some processor takes muche more time to achieve than the work dedicated to the other processors: configuring load balancing mode is then useful; in this mode of operation, the faster processors will "steal" their work to the slower ones.This mode is controlled by the following parameters:

ParameterDefaultMeaning
PARALOOP_load_balancing_enable0Enable the load balancing mode
PARALOOP_load_balancing_threshold1Faster jobs are allowed to "steal" some work to slower jobs when slower jobs have more than threshold records to process

Interrupting, checking, restarting

CommandAction
paraloop.pl --check <lock_directory>Display the avancement of the jobs
paraloop.pl --interrupt <lock_directory>Interrupt the jobs
paraloop.pl --restart <lock_directory>Restart the jobs interrupted by previous command.
paraloop.pl --waituntil <lock_directory>Do nothing: just wait until job terminated.

Parameters of the Shell plugin

Please have a look to the Shell documentation for the details about this plugin.

ParameterDefaultMeaning
PARALOOP_Shell_interpreter/bin/shpath to the default shell interpreter

Parameters of the Bioperl plugin

Please have a look to the Bioperl documentation for the details about this plugin.

ParameterDefaultMeaning
PARALOOP_Bioperl_path path to the external script, ran at each iteration
PARALOOP_Bioperl_params''parameters passed to this script
PARALOOP_Bioperl_input_formatfastaFormat of the input file, read by the external script

Parameters of the Blast plugin

Please have a look to the Blast documentation for the details about this plugin.

ParameterSwitchDefaultMeaning
PARALOOP_Blast_origin ncbincbi for blast ncbi, wu for wu blast
PARALOOP_Blast_path blastall if Blast_origin is ncbi
blastp if Blast_origin is wu
The path to the executable
PARALOOP_Blast_params -p blastp if Blast_origin is ncbi
'' if Blast_origin is wu
The parameters passed to the executable
PARALOOP_Blast_chunk 1The sequences are grouped in chunks of N sequences, N is given by this parameter
PARALOOP_db--db The database

Parameters useful for the administrator

Those parameters may be set two files, with two different meanings:

.../etc/paraloop.root.cfg
Those parameters cannot be overloaded by the user
.../etc/paraloop.cfg
Those parameters are default values, they can be overloaded by the users.

General parameters

ParameterDefaultMeaning
PARALOOP_Scheduler The Scheduler to use:
  • System for a multiprocessor machine
  • PBS for a machine equipped with the PBS queing system
  • Rsystem for a cluster without any queing system
PARALOOP_no_local_mode0If specified, the users will not be able to use the --local switch, thus forcing them to use the queing system.
PARALOOP_fair_time_limit0Set this parameter to keep the users from monopolizing the processors
PARALOOP_max_file_size1000000000If the output file grows too much, it is closed and a new file is reopened
PARALOOP_PBS_ncpus The default number of cpus when using PBS
PARALOOP_System_ncpus The default number of cpus when using System (or the --local switch)
PARALOOP_Rsystem_ncpus The default number of cpus when using Rsystem
PARALOOP_local_ncpus The default number of cpus when using local mode (switch --local)
PARALOOP_slave_ncpus The number of cpus each master job controls

Parameters for the PBS Scheduler

ParameterDefaultMeaning
PARALOOP_account The account name, passed to qsub
PARALOOP_qsub_params''Additional parameters passed to qsub
PARALOOP_queue The execution queue

Parameters for the Rsystem scheduler

ParameterDefaultMeaning
PARALOOP_Rsystem_nodes The list of nodes constituting the cluster. Example:
node1,node2,node3
PARALOOP_Rsystem_rshrshThe program to use for sending / executing something on the nodes: may be ssh
PARALOOP_Rsystem_tmp/tmpThe name of a temporary directory. This directory must be local to the node, it cannot be shared

The master/slave mode

In this mode, paraloop works in a slightly different way:

This mode is controlled by the following parameters:

ParameterDefaultMeaning
PARALOOP_modeAUTONOMOUSAUTONOMOUS or MASTER/SLAVE
PARALOOP_slave_ncpus The number of jobs each master controls.

The PARALOOP documentation

The main documentation

DocumentDescription
User documentationThe user documentation, including a tutorial for writing plugins

The plugins

DocumentDescription
PluginThe abstract class at top of the plugin hierarchy
BpInputThe abstract class used for reading files with bioperl
LnInputThe abstract class used for reading text files
BioperlA general plugin to execute a treatment on bioperl files
ShellA general plugin to execute some lines of scripts, one line per processor
BlastA specialized plugin to execute a blast (ncbi or wu) or every sequence found in the input file
DummyThis dummy plugin can be used as a template for writing your own plugins

The schedulers

DocumentDescription
SchedulerThe abstract class at top of the Scheduler hierarchy
PBSA scheduler useful when you have PBS-Pro (or other systems ?) installed
SystemThis scheduler is used with multiprocessor SMP machines, or when you use the --local switch
RsystemThis scheduler is used with clusters which do NOT have any batch system installed

The other objects or modules

DocumentDescription
_InitializableEvery object should derive from this class
ParamParserParse a parameters files
LoggerLog in a structured way
RunnerRun an external program