Python task

The PythonTask is the most basic and most flexible task in Lightflow. It allows you to execute almost arbitrary Python code in your task. The only requirement is that the Python code can be serialised and deserialised safely.

class lightflow.tasks.PythonTask(name, callback=None, *, queue='task', callback_init=None, callback_finally=None, force_run=False, propagate_skip=True)[source]

The Python task executes a user-defined python method.

Parameters:
  • name (str) – The name of the task.
  • callback (callable) –

    A reference to the Python method that should be called by the task as soon as it is run. It has to have the following definition:

    (data, store, signal, context) -> None, Action
    

    with the parameters:

    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • queue (str) – Name of the queue the task should be scheduled to. Defaults to the general task queue.
  • callback_init (callable) –

    An optional callable that is called shortly before the task is run. The definition is:

    (data, store, signal, context) -> None
    

    with the parameters:

    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • callback_finally (callable) –

    An optional callable that is always called at the end of a task, regardless whether it completed successfully, was stopped or was aborted. The definition is:

    (status, data, store, signal, context) -> None
    

    with the parameters:

    • status (TaskStatus): The current status of the task. It can be one of the following:
      • TaskStatus.Success – task was successful
      • TaskStatus.Stopped – task was stopped
      • TaskStatus.Aborted – task was aborted
      • TaskStatus.Error – task raised an exception
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • force_run (bool) – Run the task even if it is flagged to be skipped.
  • propagate_skip (bool) – Propagate the skip flag to the next task.

Bash task

The BashTask provides an easy to use task for executing bash commands. It allows you to capture and process the standard and error output of the bash command either in ‘real-time’ or once the process has completed as a file object.

class lightflow.tasks.BashTask(name, command, cwd=None, env=None, user=None, group=None, stdin=None, refresh_time=0.1, capture_stdout=False, capture_stderr=False, callback_process=None, callback_end=None, callback_stdout=None, callback_stderr=None, *, queue='task', callback_init=None, callback_finally=None, force_run=False, propagate_skip=True)[source]

The Bash task executes a user-defined bash command or bash file.

All task parameters except the name, callbacks, queue, force_run and propagate_skip can either be their native type or a callable returning the native type.

Parameters:
  • name (str) – The name of the task.
  • command (function, str) – The command or bash file that should be executed.
  • cwd (function, str, None) – The working directory for the command.
  • env (function, dict, None) – A dictionary of environment variables.
  • user (function, int, None) – The user ID of the user with which the command should be executed.
  • group (function, int, None) – The group ID of the group with which the command should be executed.
  • stdin (function, str, None) – An input string that should be passed on to the process.
  • refresh_time (function, float) – The time in seconds the internal output handling waits before checking for new output from the process.
  • capture_stdout (function, bool) – Set to True to capture all standard output in a temporary file.
  • capture_stderr (function, bool) – Set to True to capture all standard errors in a temporary file.
  • callback_process (callable) –

    A callable that is called after the process started. The definition is:

    (pid, data, store, signal, context) -> None
    

    with the parameters:

    • pid (int): The process PID.
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • callback_end (callable) –

    A callable that is called after the process completed. The definition is:

    (returncode, stdout_file, stderr_file,
     data, store, signal, context) -> None
    

    with the parameters:

    • returncode (int): The return code of the process.
    • stdout_file: A file object with the standard output if the flag capture_stdout was set to True, otherwise None.
    • stderr_file: A file object with the error output if the flag capture_stderr was set to True
      otherwise None.
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • callback_stdout (callable) –

    A callable that is called for every line of output the process sends to stdout. The definition is:

    (line, data, store, signal, context) -> None
    
    with the parameters:
    • line (str): Single line of the process output as a string.
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • callback_stderr (callable) –

    A callable that is called for every line of output the process sends to stderr. The definition is:

    (line, data, store, signal, context) -> None
    
    with the parameters:
    • line (str): Single line of the process output as a string.
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • queue (str) – Name of the queue the task should be scheduled to. Defaults to the general task queue.
  • callback_init (callable) –

    An optional callable that is called shortly before the task is run. The definition is:

    (data, store, signal, context) -> None
    

    with the parameters:

    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • callback_finally (callable) –

    An optional callable that is always called at the end of a task, regardless whether it completed successfully, was stopped or was aborted. The definition is:

    (status, data, store, signal, context) -> None
    

    with the parameters:

    • status (TaskStatus): The current status of the task. It can be one of the following:
      • TaskStatus.Success – task was successful
      • TaskStatus.Stopped – task was stopped
      • TaskStatus.Aborted – task was aborted
      • TaskStatus.Error – task raised an exception
    • data (MultiTaskData): The data object that has been passed from the predecessor task.
    • store (DataStoreDocument): The persistent data store object that allows the task to store data for access across the current workflow run.
    • signal (TaskSignal): The signal object for tasks. It wraps the construction and sending of signals into easy to use methods.
    • context (TaskContext): The context in which the tasks runs.
  • force_run (bool) – Run the task even if it is flagged to be skipped.
  • propagate_skip (bool) – Propagate the skip flag to the next task.