Step 1: A simple workflow¶
In this section of the tutorial we will write our first workflow. It will consist of two tasks that are executed in order. Each task will print a message so you can track the execution of the tasks. At the end of this section you will have learned how to create tasks, arrange their execution order and run a workflow.
Workflow file¶
In Lightflow, workflows are defined using Python. This means you don’t have to learn another language and you can use your favorite Python libraries and modules. Typically you would have a single Python file describing the entire workflow, but for complex workflows you can, of course, split the workflow definition into multiple files. For this tutorial, we will only have a single workflow file.
Change into the tutorial
directory and create an empty file called tutorial01.py
. This file will contain the workflow for this step of the tutorial.
Your directory structure should look like this:
/lightflow_tutorial
lightflow.cfg
/tutorial
tutorial01.py
Create two tasks¶
Let’s get started with our workflow. First, we will create the two tasks for our small workflow. Open the workflow file you just created with your editor of choice. At the top of the file import the PythonTask class:
from lightflow.tasks import PythonTask
Lightflow is shipped with two task classes: the PythonTask
and the BashTask
. The PythonTask
allows you to execute Python code in your task, while
the BashTask
provides an easy to use task for executing bash commands. In this tutorial we will use the PythonTask
for all our tasks as it is the most
flexible type of task. You can pretty much do whatever you like during the execution of a PythonTask
.
Next, create the two tasks for our workflow. We are going to be boring here and call the first task first_task
and the second task second_task
:
first_task = PythonTask(name='first_task',
callback=print_first)
second_task = PythonTask(name='second_task',
callback=print_second)
The first argument name
defines a name for the task so you can track the task more easily. We are using the name of the object here, but you can name the
task whatever you think is appropriate. The second argument callback
is a callable that is being run when the task is executed. This is the ‘body’ of the task
and you are free to execute your own Python code here. In the spirit of boring names for our tutorial, we have named the callables: print_first
and
print_second
. Of course, we haven’t defined the callables yet, so let’s do this next.
Implement the callables¶
We will use functions as the callables for our PythonTask
objects. The functions take a specific form and look like this:
def print_first(data, store, signal, context):
print('This is the first task')
Add this code above your task instantiations. A callable for a PythonTask
has four arguments. We will cover all four arguments in more detail
in the following tutorial steps. So for now, you can safely ignore them. All we do in the body of the function is to print a simple string.
The callable for the second task is pretty much the same, we only change the name and the string that is printed:
def print_second(data, store, signal, context):
print('This is the second task')
At this point we have the task objects that should be run and the code that should be executed for each task. We haven’t defined the order in which we want the tasks to be run yet. This will happen in the next step.
Arrange the tasks in a sequence¶
In Lightflow tasks are arranged in a Directed Acyclic Graph, or ‘DAG’ for short. While this might sound complicated, what it means is that all you do is to
define the dependencies between the tasks, thereby building a network (also called graph) of tasks. The ‘directed’ captures the fact that the dependencies impose
a direction on the graph. In our case, we want the first_task
to be run before the second_task
. Lightflow does not allow for loops in the task graph,
represented by the word ‘acyclic’. For example, you are not allowed to set up a graph in which you start with first_task
then run second_task
followed
by running first_task
again.
In Lightflow the Dag
class takes care of running the tasks in the correct order. Import the Dag
class at the top of your workflow file with:
from lightflow.models import Dag
Next, below your task object instantiations at the bottom of your workflow, create an object of the Dag
class:
d = Dag('main_dag')
You have to provide a single argument, which is the name you would like to give to the Dag.
The Dag
class provides the function define()
for setting up the task graph. This is where the magic happens. Lightflow uses a Python dictionary
in order to specify the arrangement of the tasks. The key:value relationship of a dictionary is mapped to a parent:child relationship for tasks,
thereby defining the dependencies between tasks. For our simple, two task workflow the graph definition looks like this:
d.define({
first_task: second_task
})
That’s it! You have defined our first workflow and are now ready to run it.
The complete workflow¶
Here is the complete workflow for this tutorial including a few comments:
from lightflow.models import Dag
from lightflow.tasks import PythonTask
# the callback functions for the task
def print_first(data, store, signal, context):
print('This is the first task')
def print_second(data, store, signal, context):
print('This is the second task')
# create the two task objects
first_task = PythonTask(name='first_task',
callback=print_first)
second_task = PythonTask(name='second_task',
callback=print_second)
# create the main DAG
d = Dag('main_dag')
# set up the graph of the DAG, in which the first_task has
# to be executed first, followed by the second_task.
d.define({
first_task: second_task
})
Document the workflow¶
This step is optional, but highly recommended as it will help you remembering what the workflow does. We will add a title and a short description to the workflow. At the top of your workflow file add the following docstring:
""" Tutorial 1: a sequence of two tasks
This workflow uses two tasks in order to demonstrate
the basics of a workflow definition in Lightflow.
"""
Lightflow uses the first line of the docstring when listing all available workflows. Give it a go by changing to the directory where the configuration file is located and enter:
$ lightflow workflow list
tutorial01 Tutorial 1: a sequence of two tasks
Lightflow will list your workflow together with the short description you gave it.
Start a worker¶
Lightflow uses a worker based scheme. This means a workflow adds jobs onto a central queue from which a number of workers consume jobs and execute them. In order for Lightflow to run our workflow, it needs at least one running worker. Start a worker with:
$ lightflow worker start
This will start a worker, which then waits for the first job to be added to the queue. You can start as many workers as you like, but for now one worker is enough.
Run the workflow¶
With at least one worker running, we are ready to run our first workflow. You might need a second terminal in order to run the workflow as the first one is occupied running our worker. In your second terminal enter:
$ lightflow workflow start tutorial01
This will send our workflow to the queue. The worker will pick up the workflow and run it. The default logging level is very verbose so you will see the worker print out a lot of information as it executes the workflow.
You will see how the first_task
is being executed first and prints the string “This is the first task”, then followed by the second_task
and the
string “This is the second task”.
Congratulations! You completed the first tutorial successfully.