9.2.8. API Functions#

Main API functions#

Public API and CLI components.

junifer.api.collect(storage)#

Collect and store data.

Parameters:
storagedict

Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage init function.

junifer.api.queue(config, kind, jobname='junifer_job', overwrite=False, elements=None, **kwargs)#

Queue a job to be executed later.

Parameters:
configdict

The configuration to be used for queueing the job.

kind{“HTCondor”, “GNUParallelLocal”}

The kind of job queue system to use.

jobnamestr, optional

The name of the job (default “junifer_job”).

overwritebool, optional

Whether to overwrite if job directory already exists (default False).

elementsstr or tuple or list of str or tuple, optional

Element(s) to process. Will be used to index the DataGrabber (default None).

**kwargsdict

The keyword arguments to pass to the job queue system.

Raises:
ValueError

If kind is invalid or if the jobdir exists and overwrite = False.

junifer.api.run(workdir, datagrabber, markers, storage, preprocessors=None, elements=None)#

Run the pipeline on the selected element.

Parameters:
workdirstr or pathlib.Path

Directory where the pipeline will be executed.

datagrabberdict

DataGrabber to use. Must have a key kind with the kind of DataGrabber to use. All other keys are passed to the DataGrabber init function.

markerslist of dict

List of markers to extract. Each marker is a dict with at least two keys: name and kind. The name key is used to name the output marker. The kind key is used to specify the kind of marker to extract. The rest of the keys are used to pass parameters to the marker calculation.

storagedict

Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage init function.

preprocessorslist of dict, optional

List of preprocessors to use. Each preprocessor is a dict with at least a key kind specifying the preprocessor to use. All other keys are passed to the preprocessor init function (default None).

elementsstr or tuple or list of str or tuple, optional

Element(s) to process. Will be used to index the DataGrabber (default None).

Decorators#

Provide decorators for api.

junifer.api.decorators.register(step, name, klass)#

Register a function to be used in a pipeline step.

Parameters:
stepstr

Name of the step.

namestr

Name of the function.

klassclass

Class to be registered.

Raises:
ValueError

If the step is invalid.

junifer.api.decorators.register_datagrabber(klass)#

Register DataGrabber.

Registers the DataGrabber so it can be used by name.

Parameters:
klass: class

The class of the DataGrabber to register.

Returns:
klass: class

The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_datareader(klass)#

Register DataReader.

Registers the DataReader so it can be used by name.

Parameters:
klass: class

The class of the DataReader to register.

Returns:
klass: class

The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_marker(klass)#

Marker registration decorator.

Registers the marker so it can be used by name.

Parameters:
klass: class

The class of the marker to register.

Returns:
klass: class

The unmodified input class.

junifer.api.decorators.register_preprocessor(klass)#

Preprocessor registration decorator.

Registers the preprocessor so it can be used by name.

Parameters:
klass: class

The class of the preprocessor to register.

Returns:
klass: class

The unmodified input class.

junifer.api.decorators.register_storage(klass)#

Storage registration decorator.

Registers the storage so it can be used by name.

Parameters:
klass: class

The class of the storage to register.

Returns:
klass: class

The unmodified input class.

Queue Context#

Context adapters for queueing.

class junifer.api.queue_context.GnuParallelLocalAdapter(job_name, job_dir, yaml_config_path, elements, pre_run=None, pre_collect=None, env=None, verbose='info', submit=False)#

Class for generating commands for GNU Parallel (local).

Parameters:
job_namestr

The job name.

job_dirpathlib.Path

The path to the job directory.

yaml_config_pathpathlib.Path

The path to the YAML config file.

elementslist of str or tuple

Element(s) to process. Will be used to index the DataGrabber.

pre_runstr or None, optional

Extra shell commands to source before the run (default None).

pre_collectstr or None, optional

Extra bash commands to source before the collect (default None).

envdict, optional

The Python environment configuration. If None, will run without a virtual environment of any kind (default None).

verbosestr, optional

The level of verbosity (default “info”).

submitbool, optional

Whether to submit the jobs (default False).

Raises:
ValueError

If env.kind is invalid or if env.shell is invalid.

See also

QueueContextAdapter

The base class for QueueContext.

HTCondorAdapter

The concrete class for queueing via HTCondor.

Initialize the class.

collect()#

Return collect commands.

elements()#

Return elements to run.

pre_collect()#

Return pre-collect commands.

pre_run()#

Return pre-run commands.

prepare()#

Prepare assets for submission.

run()#

Return run commands.

class junifer.api.queue_context.HTCondorAdapter(job_name, job_dir, yaml_config_path, elements, pre_run=None, pre_collect=None, env=None, verbose='info', cpus=1, mem='8G', disk='1G', extra_preamble=None, collect='yes', submit=False)#

Class for generating queueing scripts for HTCondor.

Parameters:
job_namestr

The job name to be used by HTCondor.

job_dirpathlib.Path

The path to the job directory.

yaml_config_pathpathlib.Path

The path to the YAML config file.

elementslist of str or tuple

Element(s) to process. Will be used to index the DataGrabber.

pre_runstr or None, optional

Extra bash commands to source before the run (default None).

pre_collectstr or None, optional

Extra bash commands to source before the collect (default None).

envdict, optional

The Python environment configuration. If None, will run without a virtual environment of any kind (default None).

verbosestr, optional

The level of verbosity (default “info”).

cpusint, optional

The number of CPU cores to use (default 1).

memstr, optional

The size of memory (RAM) to use (default “8G”).

diskstr, optional

The size of disk (HDD or SSD) to use (default “1G”).

extra_preamblestr or None, optional

Extra commands to pass to HTCondor (default None).

collect{“yes”, “on_success_only”, “no”}, optional

Whether to submit “collect” task for junifer (default “yes”). Valid options are:

  • “yes”: Submit “collect” task and run even if some of the jobs

    fail.

  • “on_success_only”: Submit “collect” task and run only if all jobs

    succeed.

  • “no”: Do not submit “collect” task.

submitbool, optional

Whether to submit the jobs. In any case, .dag files will be created for submission (default False).

Raises:
ValueError

If collect is invalid or if env is invalid.

See also

QueueContextAdapter

The base class for QueueContext.

GnuParallelLocalAdapter

The concrete class for queueing via GNU Parallel (local).

Initialize the class.

collect()#

Return collect commands.

dag()#

Return HTCondor DAG commands.

pre_collect()#

Return pre-collect commands.

pre_run()#

Return pre-run commands.

prepare()#

Prepare assets for submission.

run()#

Return run commands.

class junifer.api.queue_context.QueueContextAdapter#

Abstract base class for queue context adapter.

For every interface that is required, one needs to provide a concrete implementation of this abstract class.

abstract collect()#

Return collect commands.

abstract pre_collect()#

Return pre-collect commands.

abstract pre_run()#

Return pre-run commands.

abstract prepare()#

Prepare assets for submission.

abstract run()#

Return run commands.