StarPU Handbook - StarPU Extensions
23. How To Define A New Scheduling Policy

23.1 Introduction

StarPU provides two ways of defining a scheduling policy, a basic monolithic way, and a modular way.

The basic monolithic way is directly connected with the core of StarPU, which means that the policy then has to handle all performance details, such as data prefetching, task performance model calibration, worker locking, etc. examples/scheduler/dummy_sched.c is a trivial example which does not handle this, and thus e.g. does not achieve any data prefetching or smart scheduling.

The modular way allows implementing just one component, and reuse existing components to cope with all these details. examples/scheduler/dummy_modular_sched.c is a trivial example very similar to dummy_sched.c, but implemented as a component, which allows assembling it with other components, and notably get data prefetching support for free, and task performance model calibration is properly performed, which allows to easily extend it into taking task duration into account, etc.

23.2 Helper functions for defining a scheduling policy (Basic or modular)

Make sure to have a look at the Scheduling Policy section, which provides a complete list of the functions available for writing advanced schedulers.

This includes getting an estimation for a task computation completion with starpu_task_expected_length(), for a speedup factor relative to CPU speed with starpu_worker_get_relative_speedup(), for the expected data transfer time in micro-seconds with starpu_task_expected_data_transfer_time(), starpu_task_expected_data_transfer_time_for(), or starpu_data_expected_transfer_time(), for the expected conversion time in micro-seconds with starpu_task_expected_conversion_time(), for the required energy with starpu_task_expected_energy() or starpu_task_worker_expected_energy(), etc. Per-worker variants are also available with starpu_task_worker_expected_length(), etc. The average over workers is also available with starpu_task_expected_length_average() and starpu_task_expected_energy_average(). Other useful functions include starpu_transfer_bandwidth(), starpu_transfer_latency(), starpu_transfer_predict(), ... The successors of a task can be obtained with starpu_task_get_task_succs(). One can also directly test the presence of a data handle with starpu_data_is_on_node(). Prefetches can be triggered by calling either starpu_prefetch_task_input_for(), starpu_idle_prefetch_task_input_for(), starpu_prefetch_task_input_for_prio(), or starpu_idle_prefetch_task_input_for_prio(). And prefetching data on a specified node can use either starpu_prefetch_task_input_on_node(), starpu_prefetch_task_input_on_node_prio(), starpu_idle_prefetch_task_input_on_node(), or starpu_idle_prefetch_task_input_on_node_prio(). The _prio versions allow specifying a priority for the transfer (instead of taking the task priority by default). These prefetches are only processed when there are no fetch data requests (i.e. a task is waiting for it) to process. The _idle versions queue the transfers on the idle prefetch queue, which is only processed when there are no non-idle prefetches to process. starpu_get_prefetch_flag() is a convenient helper for checking the value of the STARPU_PREFETCH environment variable. When a scheduler does such prefetching, it should set the prefetches field of the starpu_sched_policy to 1, to prevent the core from triggering its own prefetching.

For applications that need to prefetch data or to perform other pre-execution setup before a task is executed, it is useful to call the function starpu_task_notify_ready_soon_register() which registers a callback function when a task is about to become ready for execution. starpu_worker_set_going_to_sleep_callback() and starpu_worker_set_waking_up_callback() allow to register an external resource manager callback function that will be notified about workers going to sleep or waking up, when StarPU is compiled with support for blocking drivers and worker callbacks.

Schedulers should call starpu_task_set_implementation() or starpu_task_get_implementation() to specify or to retrieve the codelet implementation to be executed when executing a specific task.

One can determine if a worker type is capable of executing a specific task by calling the function starpu_worker_type_can_execute_task(). The function starpu_sched_find_all_worker_combinations() must be used to identify all viable worker combinations that can execute a parallel task. starpu_combined_worker_get_count() and starpu_worker_is_combined_worker() can be used to determine the number of different combined workers and whether a particular worker is a combined worker respectively. starpu_combined_worker_get_id() and starpu_combined_worker_assign_workerid() allow users to get identifier of the current combined worker or register a new combined worker and get its identifier. starpu_combined_worker_get_description() returns the description of a combined worker. Additionally, the function starpu_worker_is_blocked_in_parallel() is utilized to determine if a worker is currently blocked in a parallel task, whereas starpu_worker_is_slave_somewhere() can be called to determine if a worker is presently functioning as a slave for another worker. StarPU also provides two functions for initializing and preparing the execution of parallel tasks: starpu_parallel_task_barrier_init() and starpu_parallel_task_barrier_init_n().

Usual functions can be used on tasks, for instance one can use the following to get the data size for a task.

size = 0;
write = 0;
if (task->cl)
for (i = 0; i < STARPU_TASK_GET_NBUFFERS(task); i++)
{
size_t datasize = starpu_data_get_size(data);
size += datasize;
write += datasize;
}
#define STARPU_TASK_GET_HANDLE(task, i)
Definition: starpu_task.h:1535
#define STARPU_TASK_GET_MODE(task, i)
Definition: starpu_task.h:1594
#define STARPU_TASK_GET_NBUFFERS(task)
Definition: starpu_task.h:1526
size_t starpu_data_get_size(starpu_data_handle_t handle)
struct _starpu_data_state * starpu_data_handle_t
Definition: starpu_data.h:44
@ STARPU_W
Definition: starpu_data.h:58

Task queues can be implemented with the starpu_task_list functions. The function starpu_task_list_init() is used to initialize an empty list structure. Once the list is initialized, new tasks can be added to it using the starpu_task_list_push_front() and starpu_task_list_push_back() to add a task to the front or back of the list respectively. starpu_task_list_front() and starpu_task_list_back() can be used to get the first or last task in the list without removing it. starpu_task_list_begin() and starpu_task_list_end() can be used to get the task iterators from the beginning of the list and check whether it is the end of the list respectively. starpu_task_list_next() can be used to get the next task in the list, which is not erase-safe. starpu_task_list_empty() can be used to check whether the list is empty. To remove tasks from the queue, the function starpu_task_list_erase() is used to remove a specific task from the list. starpu_task_list_pop_front() and starpu_task_list_pop_back() can be used to remove the first or last task from the list. Finally, the function starpu_task_list_ismember() is used to check whether a given task is contained in the list. The function starpu_task_list_move() is used to move list from one head to another.

Access to the hwloc topology is available with starpu_worker_get_hwloc_obj().

23.3 Defining A New Basic Scheduling Policy

A full example showing how to define a new scheduling policy is available in the StarPU sources in examples/scheduler/dummy_sched.c.

The scheduler has to provide methods:

static struct starpu_sched_policy dummy_sched_policy =
{
.init_sched = init_dummy_sched,
.deinit_sched = deinit_dummy_sched,
.add_workers = dummy_sched_add_workers,
.remove_workers = dummy_sched_remove_workers,
.push_task = push_task_dummy,
.pop_task = pop_task_dummy,
.policy_name = "dummy",
.policy_description = "dummy scheduling strategy"
};
void(* init_sched)(unsigned sched_ctx_id)
Definition: starpu_scheduler.h:87
Definition: starpu_scheduler.h:82

The idea is that when a task becomes ready for execution, the starpu_sched_policy::push_task method is called to give the ready task to the scheduler. Then call starpu_push_task_end() to notify that the specified task has been pushed. When a worker is idle, the starpu_sched_policy::pop_task method is called to get a task from the scheduler. It is up to the scheduler to implement what is between. A simple eager scheduler is for instance to make starpu_sched_policy::push_task push the task to a global list, and make starpu_sched_policy::pop_task pop from this list. A scheduler can also use starpu_push_local_task() to directly push tasks to a per-worker queue, and then StarPU does not even need to implement starpu_sched_policy::pop_task. If there are no ready tasks within the scheduler, it can just return NULL, and the worker will sleep.

starpu_sched_policy::add_workers and starpu_sched_policy::remove_workers are used to add or remove workers to or from a scheduling policy, so that the number of workers in a policy can be dynamically adjusted. After adding or removing workers from a scheduling policy, the worker task lists should be updated to ensure that the workers are assigned tasks appropriately. By calling starpu_sched_ctx_worker_shares_tasks_lists(), you can specify whether a worker may pop tasks from the task list of other workers or if there is a central list with tasks for all the workers.

The starpu_sched_policy section provides the exact rules that govern the methods of the policy.

One can enumerate the workers with this iterator:

workers->init_iterator(workers, &it);
while(workers->has_next(workers, &it))
{
unsigned worker = workers->get_next(workers, &it);
...
}
struct starpu_worker_collection * starpu_sched_ctx_get_worker_collection(unsigned sched_ctx_id)
unsigned(* has_next)(struct starpu_worker_collection *workers, struct starpu_sched_ctx_iterator *it)
Definition: starpu_worker.h:137
int(* get_next)(struct starpu_worker_collection *workers, struct starpu_sched_ctx_iterator *it)
Definition: starpu_worker.h:141
void(* init_iterator)(struct starpu_worker_collection *workers, struct starpu_sched_ctx_iterator *it)
Definition: starpu_worker.h:161
Definition: starpu_worker.h:84
Definition: starpu_worker.h:113

To provide synchronization between workers, a per-worker lock exists to protect the data structures of a given worker. It is acquired around scheduler methods, so that the scheduler does not need any additional mutex to protect its per-worker data.

In case the scheduler wants to access another scheduler's data, it should use starpu_worker_lock() and starpu_worker_unlock(), or use starpu_worker_trylock() which will not block if the lock is not immediately available, or use starpu_worker_lock_self() and starpu_worker_unlock_self() to acquire and to release a lock on the worker associated with the current thread.

Calling

void starpu_worker_lock(int workerid)

from a worker A will however thus make worker A wait for worker B to complete its scheduling method. That may be a problem if that method takes a long time, because it is e.g. computing a heuristic or waiting for another mutex, or even cause deadlocks if worker B is calling

at the same time. In such a case, worker B must call starpu_worker_relax_on() and starpu_worker_relax_off() around the section which potentially blocks (and does not actually need protection). While a worker is in relaxed mode, e.g. between a pair of starpu_worker_relax_on() and starpu_worker_relax_off() calls, its state can be altered by other threads: for instance, worker A can push tasks for worker B. In consequence, worker B must re-assess its state after

void starpu_worker_relax_off(void)

, such as taking possible new tasks pushed to its queue into account. Calling starpu_worker_get_relax_state() to query the relaxation state of a worker.

When the starpu_sched_policy::push_task method has pushed a task for another worker, one has to call starpu_wake_worker_relax(), starpu_wake_worker_relax_light(), starpu_wake_worker_no_relax() or starpu_wake_worker_locked() so that the worker wakes up and picks it. If the task was pushed on a shared queue, one may want to only wake one idle worker. An example doing this is available in src/sched_policies/eager_central_policy.c. When the scheduling policy makes a scheduling decision for a task, it shouhld call starpu_sched_task_break().

Schedulers can set the minimum or maximum task priority level supported by the scheduling policy by calling starpu_sched_set_min_priority() or starpu_sched_set_max_priority(), and then applications can call starpu_sched_get_min_priority() or starpu_sched_get_max_priority() to retrieve the minimum or maximum priority value. The file src/sched_policies/heteroprio.c shows how to uses these functions.

When scheduling a task, it is important to check whether the specified worker can execute the codelet before assigning the task to that worker. This is done using the starpu_worker_can_execute_task() function, or starpu_combined_worker_can_execute_task() which is compatible with combined workers, or starpu_worker_can_execute_task_impl() which also returns the list of implementation numbers that can be used by the worker to execute the task, or starpu_worker_can_execute_task_first_impl() which also returns the first implementation number that can be used.

A pointer to one data structure specific to the scheduler can be set with starpu_sched_ctx_set_policy_data() and fetched with starpu_sched_ctx_get_policy_data(). Per-worker data structures can then be stored in it by allocating a STARPU_NMAXWORKERS -sized array of structures indexed by workers.

A variety of examples of advanced schedulers can be read in src/sched_policies, for instance random_policy.c, eager_central_policy.c, work_stealing_policy.c Code protected by if (_starpu_get_nsched_ctxs() > 1) can be ignored, this is for scheduling contexts, which is an experimental feature.

23.4 Defining A New Modular Scheduling Policy

StarPU's Modularized Schedulers are made of individual Scheduling Components Modularizedly assembled as a Scheduling Tree. Each Scheduling Component has a unique purpose, such as prioritizing tasks or mapping tasks over resources. A typical Scheduling Tree is shown below.

                                 |
             starpu_push_task    |
                                 |
                                 v
                           Fifo_Component
                                |  ^
                        Push    |  |    Can_Push
                                v  |
                          Eager_Component
                                |  ^
                                |  |
                                v  |
              --------><-------------------><---------
              |  ^                                |  ^
      Push    |  |    Can_Push            Push    |  |    Can_Push
              v  |                                v  |
         Fifo_Component                       Fifo_Component
              |  ^                                |  ^
      Pull    |  |    Can_Pull            Pull    |  |    Can_Pull
              v  |                                v  |
        Worker_Component                     Worker_Component
                  |                             |
starpu_pop_task   |                             |
                  v                             v

When a task is pushed by StarPU in a Modularized Scheduler, the task moves from a Scheduling Component to another, following the hierarchy of the Scheduling Tree, and is stored in one of the Scheduling Components of the strategy. When a worker wants to pop a task from the Modularized Scheduler, the corresponding Worker Component of the Scheduling Tree tries to pull a task from its parents, following the hierarchy, and gives it to the worker if it succeeded to get one.

23.4.1 Interface

Each Scheduling Component must follow the following pre-defined Interface to be able to interact with other Scheduling Components.

  • push_task (child_component, Task)
    The calling Scheduling Component transfers a task to its Child Component. When the Push function returns, the task no longer belongs to the calling Component. The Modularized Schedulers' model relies on this function to perform prefetching. See starpu_sched_component::push_task for more details

  • pull_task (parent_component, caller_component) -> Task
    The calling Scheduling Component requests a task from its Parent Component. When the Pull function ends, the returned task belongs to the calling Component. See starpu_sched_component::pull_task for more details

  • can_push (caller_component, parent_component)
    The calling Scheduling Component notifies its Parent Component that it is ready to accept new tasks. See starpu_sched_component::can_push for more details

  • can_pull (caller_component, child_component)
    The calling Scheduling Component notifies its Child Component that it is ready to give new tasks. See starpu_sched_component::can_pull for more details

The components also provide the following useful methods:

23.4.2 Building a Modularized Scheduler

23.4.2.1 Pre-implemented Components

StarPU is currently shipped with the following four Scheduling Components :

  • Storage Components : Fifo, Prio
    Components which store tasks. They can also prioritize them if they have a defined priority. It is possible to define a threshold for those Components following two criteria : the number of tasks stored in the Component, or the sum of the expected length of all tasks stored in the Component. When a push operation tries to queue a task beyond the threshold, the push fails. When some task leaves the queue (and thus possibly more tasks can fit), this component calls can_push from ancestors.

  • Resource-Mapping Components : Mct, Heft, Eager, Random, Work-Stealing
    "Core" of the Scheduling Strategy, those Components are the ones who make scheduling choices between their children components.

  • Worker Components : Worker
    Each Worker Component modelizes a concrete worker, and copes with the technical tricks of interacting with the StarPU core. Modular schedulers thus usually have them at the bottom of their component tree.

  • Special-Purpose Components : Perfmodel_Select, Best_Implementation
    Components dedicated to original purposes. The Perfmodel_Select Component decides which Resource-Mapping Component should be used to schedule a task: a component that assumes tasks with a calibrated performance model; a component for non-yet-calibrated tasks, that will distribute them to get measurements done as quickly as possible; and a component that takes the tasks without performance models.
    The Best_Implementation Component chooses which implementation of a task should be used on the chosen resource.

23.4.2.2 Progression And Validation Rules

Some rules must be followed to ensure the correctness of a Modularized Scheduler :

  • At least one Storage Component without threshold is needed in a Modularized Scheduler, to store incoming tasks from StarPU. It can for instance be a global component at the top of the tree, or one component per worker at the bottom of the tree, or intermediate assemblies. The important point is that the starpu_sched_component::push_task call at the top can not fail, so there has to be a storage component without threshold between the top of the tree and the first storage component with threshold, or the workers themselves.

  • At least one Resource-Mapping Component is needed in a Modularized Scheduler. Resource-Mapping Components are the only ones which can make scheduling choices, and so the only ones which can have several children.

23.4.2.3 Locking in modularized schedulers

Most often, components do not need to take locks. This allows e.g. the push operation to be called in parallel when tasks get released in parallel from different workers which have completed different ancestor tasks.

When a component has internal information which needs to be kept coherent, the component can define its own lock to take it as it sees fit, e.g. to protect a task queue. This may however limit scalability of the scheduler. Conversely, since push and pull operations will be called concurrently from different workers, the component might prefer to use a central mutex to serialize all scheduling decisions to avoid pathological cases (all push calls decide to put their task on the same target)

23.4.2.4 Implementing a Modularized Scheduler

The following code shows how to implement a Tree-Eager-Prefetching Scheduler.

static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
{
/* The eager component will decide for each task which worker will run it,
* and we want fifos both above and below the component */
starpu_sched_component_eager_create, NULL,
sched_ctx_id);
}
/* Initializing the starpu_sched_policy struct associated to the Modularized
* Scheduler : only the init_sched and deinit_sched needs to be defined to
* implement a Modularized Scheduler */
struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy =
{
.init_sched = initialize_eager_prefetching_center_policy,
.policy_name = "tree-eager-prefetching",
.policy_description = "eager with prefetching tree policy"
};
void starpu_sched_tree_remove_workers(unsigned sched_ctx_id, int *workerids, unsigned nworkers)
void starpu_sched_component_initialize_simple_scheduler(starpu_sched_component_create_t create_decision_component, void *data, unsigned flags, unsigned sched_ctx_id)
int starpu_sched_tree_push_task(struct starpu_task *task)
struct starpu_task * starpu_sched_tree_pop_task(unsigned sched_ctx)
#define STARPU_SCHED_SIMPLE_DECIDE_WORKERS
Definition: starpu_sched_component.h:739
void starpu_sched_tree_add_workers(unsigned sched_ctx_id, int *workerids, unsigned nworkers)
void starpu_sched_tree_deinitialize(unsigned sched_ctx_id)
#define STARPU_SCHED_SIMPLE_FIFOS_BELOW
Definition: starpu_sched_component.h:784
void starpu_sched_component_worker_post_exec_hook(struct starpu_task *task, unsigned sched_ctx_id)
#define STARPU_SCHED_SIMPLE_FIFO_ABOVE
Definition: starpu_sched_component.h:772
void starpu_sched_component_worker_pre_exec_hook(struct starpu_task *task, unsigned sched_ctx_id)

starpu_sched_component_initialize_simple_scheduler() is a helper function which makes it very trivial to assemble a modular scheduler around a scheduling decision component as seen above (here, a dumb eager decision component). Most often, a modular scheduler can be implemented that way.

A modular scheduler can also be constructed hierarchically with starpu_sched_component_composed_recipe_create().

To retrieve the current scheduling tree of a task, starpu_sched_tree_get() can be called.

That modular scheduler can also be built by hand in the following way:

#define _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT 2
#define _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT 1000000000.0
static void initialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
{
unsigned ntasks_threshold = _STARPU_SCHED_NTASKS_THRESHOLD_DEFAULT;
double exp_len_threshold = _STARPU_SCHED_EXP_LEN_THRESHOLD_DEFAULT;
[...]
(sched_ctx_id, STARPU_WORKER_LIST);
/* Create the Scheduling Tree */
/* The Root Component is a Flow-control Fifo Component */
/* The Resource-mapping Component of the strategy is an Eager Component
*/
struct starpu_sched_component *eager_component = starpu_sched_component_eager_create(NULL);
/* Create links between Components : the Eager Component is the child
* of the Root Component */
starpu_sched_component_connect(t->root, eager_component);
/* A task threshold is set for the Flow-control Components which will
* be connected to Worker Components. By doing so, this Modularized
* Scheduler will be able to perform some prefetching on the resources
*/
{
.ntasks_threshold = ntasks_threshold,
.exp_len_threshold = exp_len_threshold,
};
unsigned i;
{
/* Each Worker Component has a Flow-control Fifo Component as
* father */
struct starpu_sched_component * worker_component = starpu_sched_component_worker_new(i);
struct starpu_sched_component * fifo_component = starpu_sched_component_fifo_create(&fifo_data);
starpu_sched_component_connect(fifo_component, worker_component);
/* Each Flow-control Fifo Component associated to a Worker
* Component is linked to the Eager Component as one of its
* children */
starpu_sched_component_connect(eager_component, fifo_component);
}
starpu_sched_ctx_set_policy_data(sched_ctx_id, (void*)t);
}
/* Properly destroy the Scheduling Tree and all its Components */
static void deinitialize_eager_prefetching_center_policy(unsigned sched_ctx_id)
{
}
/* Initializing the starpu_sched_policy struct associated to the Modularized
* Scheduler : only the init_sched and deinit_sched needs to be defined to
* implement a Modularized Scheduler */
struct starpu_sched_policy _starpu_sched_tree_eager_prefetching_policy =
{
.init_sched = initialize_eager_prefetching_center_policy,
.deinit_sched = deinitialize_eager_prefetching_center_policy,
.policy_name = "tree-eager-prefetching",
.policy_description = "eager with prefetching tree policy"
};
struct starpu_sched_component * root
Definition: starpu_sched_component.h:189
unsigned sched_ctx_id
Definition: starpu_sched_component.h:197
struct starpu_sched_component * starpu_sched_component_fifo_create(struct starpu_sched_tree *tree, struct starpu_sched_component_fifo_data *fifo_data) STARPU_ATTRIBUTE_MALLOC
void starpu_sched_tree_update_workers(struct starpu_sched_tree *t)
struct starpu_sched_tree * starpu_sched_tree_create(unsigned sched_ctx_id) STARPU_ATTRIBUTE_MALLOC
void starpu_sched_component_connect(struct starpu_sched_component *parent, struct starpu_sched_component *child)
void starpu_sched_tree_destroy(struct starpu_sched_tree *tree)
Definition: starpu_sched_component.h:65
Definition: starpu_sched_component.h:439
Definition: starpu_sched_component.h:185
unsigned starpu_combined_worker_get_count(void)
void starpu_sched_ctx_set_policy_data(unsigned sched_ctx_id, void *policy_data)
void starpu_sched_ctx_delete_worker_collection(unsigned sched_ctx_id)
void * starpu_sched_ctx_get_policy_data(unsigned sched_ctx_id)
struct starpu_worker_collection * starpu_sched_ctx_create_worker_collection(unsigned sched_ctx_id, enum starpu_worker_collection_type type) STARPU_ATTRIBUTE_MALLOC
unsigned starpu_worker_get_count(void)
@ STARPU_WORKER_LIST
Definition: starpu_worker.h:102

Instead of calling starpu_sched_tree_update_workers(), one can call starpu_sched_tree_update_workers_in_ctx() to update the set of workers that are available to execute tasks in a given scheduling tree within a specific StarPU context.

Other modular scheduler examples can be seen in src/sched_policies/modular_*.c

For instance, modular-heft-prio needs performance models, decides memory nodes, uses prioritized fifos above and below, and decides the best implementation.

If unsure on the result of the modular scheduler construction, you can run a simple application with FxT enabled (see GeneratingTracesWithFxT), and open the generated file trace.html in a web-browser.

23.4.3 Management of parallel task

At the moment, parallel tasks can be managed in modularized schedulers through combined workers: instead of connecting a scheduling component to a worker component, one can connect it to a combined worker component (i.e. a worker component created with a combined worker id). That component will handle creating task aliases for parallel execution and push them to the different workers components.

23.4.4 Writing a Scheduling Component

23.4.4.1 Generic Scheduling Component

Each Scheduling Component is instantiated from a Generic Scheduling Component, which implements a generic version of the Interface. The generic implementation of Pull, Can_Pull and Can_Push functions are recursive calls to their parents (respectively to their children). However, as a Generic Scheduling Component do not know how many children it will have when it will be instantiated, it does not implement the Push function.

23.4.4.2 Instantiation : Redefining the Interface

A Scheduling Component must implement all the functions of the Interface. It is so necessary to implement a Push function to instantiate a Scheduling Component. The implemented Push function is the "fingerprint" of a Scheduling Component. Depending on how functionalities or properties programmers want to give to the Scheduling Component they are implementing, it is possible to reimplement all the functions of the Interface. For example, a Flow-control Component reimplements the Pull and the Can_Push functions of the Interface, allowing to catch the generic recursive calls of these functions. The Pull function of a Flow-control Component can, for example, pop a task from the local storage queue of the Component, and give it to the calling Component which asks for it.

23.4.4.3 Detailed Progression and Validation Rules

  • A Reservoir is a Scheduling Component which redefines a Push and a Pull function, in order to store tasks into it. A Reservoir delimit Scheduling Areas in the Scheduling Tree.

  • A Pump is the engine source of the Scheduler : it pushes/pulls tasks to/from a Scheduling Component to another. Native Pumps of a Scheduling Tree are located at the root of the Tree (incoming Push calls from StarPU), and at the leafs of the Tree (Pop calls coming from StarPU Workers). Pre-implemented Scheduling Components currently shipped with Pumps are Flow-Control Components and the Resource-Mapping Component Heft, within their defined Can_Push functions.

  • A correct Scheduling Tree requires a Pump per Scheduling Area and per Execution Flow.

The Tree-Eager-Prefetching Scheduler shown in Section Implementing a Modularized Scheduler follows the previous assumptions :

                                  starpu_push_task
                                       Pump
                                         |
 Area 1                                  |
                                         |
                                         v
            -----------------------Fifo_Component-----------------------------
                                       Pump
                                        |  ^
                                Push    |  |    Can_Push
                                        v  |
 Area 2                           Eager_Component
                                        |  ^
                                        |  |
                                        v  |
                      --------><-------------------><---------
                      |  ^                                |  ^
              Push    |  |    Can_Push            Push    |  |    Can_Push
                      v  |                                v  |
            -----Fifo_Component-----------------------Fifo_Component----------
                      |  ^                                |  ^
              Pull    |  |    Can_Pull            Pull    |  |    Can_Pull
 Area 3               v  |                                v  |
                     Pump                               Pump
                Worker_Component                     Worker_Component

23.5 Using a New Scheduling Policy

There are two ways to use a new scheduling policy.

  • If the code is directly available from your application, you can set the field starpu_conf::sched_policy with a pointer to your new defined scheduling policy.

    conf.sched_policy = &dummy_sched_policy,
    ret = starpu_init(&conf);
    int starpu_init(struct starpu_conf *conf)
    int starpu_conf_init(struct starpu_conf *conf)

  • You can also load the new policy dynamically using the environment variable STARPU_SCHED_LIB. An example is given in examples/scheduler/libdummy_sched.c and examples/scheduler/libdummy_sched.sh.

    The variable STARPU_SCHED_LIB needs to give the location of a .so file which needs to define a function struct starpu_sched_policy *starpu_get_sched_lib_policy(const char *name)

    struct starpu_sched_policy *get_sched_policy(const char *name)
    {
    if (!strcmp(name, "dummy"))
    return &dummy_sched_policy;
    return NULL;
    }

    To use it, you need to define both variables STARPU_SCHED_LIB and STARPU_SCHED

    STARPU_SCHED_LIB=libdummy_sched.so STARPU_SCHED=dummy yourapplication

    If the library defines a function struct starpu_sched_policy **starpu_get_sched_lib_policies(), the policies defined by the library can be displayed using the help functionality.

    STARPU_SCHED_LIB=libdummy_sched.so STARPU_SCHED=help yourapplication

23.6 Graph-based Scheduling

For performance reasons, most of the schedulers shipped with StarPU use simple list-scheduling heuristics, assuming that the application has already set priorities. This is why they do their scheduling between when tasks become available for execution and when a worker becomes idle, without looking at the task graph.

Other heuristics can however look at the task graph. Recording the task graph is expensive, so it is not available by default, the scheduling heuristic has to set _starpu_graph_record to 1 from the initialization function, to make it available. Then the _starpu_graph* functions can be used.

src/sched_policies/graph_test_policy.c is an example of simple greedy policy which automatically computes priorities by bottom-up rank.

The idea is that while the application submits tasks, they are only pushed to a bag of tasks. When the application is finished with submitting tasks, it calls starpu_do_schedule() (or starpu_task_wait_for_all(), which calls starpu_do_schedule()), and the starpu_sched_policy::do_schedule method of the scheduler is called. This method calls _starpu_graph_compute_depths() to compute the bottom-up ranks, and then uses these ranks to set priorities over tasks.

It then has two priority queues, one for CPUs, and one for GPUs, and uses a dumb heuristic based on the duration of the task over CPUs and GPUs to decide between the two queues. CPU workers can then pop from the CPU priority queue, and GPU workers from the GPU priority queue.

23.7 Debugging Scheduling

All the OnlinePerformanceTools and OfflinePerformanceTools can be used to get information about how well the execution proceeded, and thus the overall quality of the execution.

Precise debugging can also be performed by using the STARPU_TASK_BREAK_ON_PUSH, STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_POP, and STARPU_TASK_BREAK_ON_EXEC environment variables. By setting the job_id of a task in these environment variables, StarPU will raise SIGTRAP when the task is being scheduled, pushed, or popped by the scheduler. This means that when one notices that a task is being scheduled in a seemingly odd way, one can just re-execute the application in a debugger, with some of those variables set, and the execution will stop exactly at the scheduling points of this task, thus allowing to inspect the scheduler state, etc.