9.2.4. API Functions

Main Functions

Provide API functions.

junifer.api.functions.collect(storage)

Collect and store data.

Parameters:
storagedict

Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage constructor.

junifer.api.functions.list_elements(datagrabber, elements=None)

List elements of the datagrabber filtered using elements.

Parameters:
datagrabberdict

DataGrabber to index. Must have a key kind with the kind of DataGrabber to use. All other keys are passed to the DataGrabber constructor.

elementslist or None, optional

Element(s) to filter using. Will be used to index the DataGrabber (default None).

junifer.api.functions.queue(config, kind, jobname='junifer_job', overwrite=False, elements=None, **kwargs)

Queue a job to be executed later.

Parameters:
configdict

The configuration to be used for queueing the job.

kind{“HTCondor”, “GNUParallelLocal”}

The kind of job queue system to use.

jobnamestr, optional

The name of the job (default “junifer_job”).

overwritebool, optional

Whether to overwrite if job directory already exists (default False).

elementslist or None, optional

Element(s) to process. Will be used to index the DataGrabber (default None).

**kwargsdict

The keyword arguments to pass to the job queue system.

Raises:
ValueError

If kind is invalid or if the jobdir exists and overwrite = False.

junifer.api.functions.reset(config)

Reset the storage and jobs directory.

Parameters:
configdict

The configuration to be used for resetting.

junifer.api.functions.run(workdir, datagrabber, markers, storage, preprocessors=None, elements=None)

Run the pipeline on the selected element.

Parameters:
workdirstr or pathlib.Path or dict

Directory where the pipeline will be executed.

datagrabberdict

DataGrabber to use. Must have a key kind with the kind of DataGrabber to use. All other keys are passed to the DataGrabber constructor.

markerslist of dict

List of markers to extract. Each marker is a dict with at least two keys: name and kind. The name key is used to name the output marker. The kind key is used to specify the kind of marker to extract. The rest of the keys are used to pass parameters to the marker calculation.

storagedict

Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage constructor.

preprocessorslist of dict or None, optional

List of preprocessors to use. Each preprocessor is a dict with at least a key kind specifying the preprocessor to use. All other keys are passed to the preprocessor constructor (default None).

elementslist or None, optional

Element(s) to process. Will be used to index the DataGrabber (default None).

Raises:
ValueError

If workdir.cleanup=False when len(elements) > 1.

RuntimeError

If invalid element selectors are found.

Decorators

Provide API decorators.

junifer.api.decorators.register_data_dump_asset(types, exts)

Asset registration decorator.

Registers the data dump asset for types with exts.

Parameters:
typeslist of class

The classes to dump.

extslist of str

The extensions to load.

Returns:
class

The unmodified input class.

junifer.api.decorators.register_data_registry(name)

Registry registration decorator.

Registers the data registry as name.

Parameters:
namestr

The name of the data registry.

Returns:
class

The unmodified input class.

junifer.api.decorators.register_datagrabber(klass)

Register DataGrabber.

Registers the DataGrabber so it can be used by name.

Parameters:
klassclass

The class of the DataGrabber to register.

Returns:
class

The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_datareader(klass)

Register DataReader.

Registers the DataReader so it can be used by name.

Parameters:
klassclass

The class of the DataReader to register.

Returns:
class

The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_marker(klass)

Marker registration decorator.

Registers the marker so it can be used by name.

Parameters:
klassclass

The class of the marker to register.

Returns:
class

The unmodified input class.

junifer.api.decorators.register_preprocessor(klass)

Preprocessor registration decorator.

Registers the preprocessor so it can be used by name.

Parameters:
klassclass

The class of the preprocessor to register.

Returns:
class

The unmodified input class.

junifer.api.decorators.register_storage(klass)

Storage registration decorator.

Registers the storage so it can be used by name.

Parameters:
klassclass

The class of the storage to register.

Returns:
class

The unmodified input class.

Queue Context

Context adapters for queueing.

enum junifer.api.queue_context.EnvKind(value)

Accepted Python environment kind.

Member Type:

str

Valid values are as follows:

Venv = <EnvKind.Venv: 'venv'>
Conda = <EnvKind.Conda: 'conda'>
Local = <EnvKind.Local: 'local'>
enum junifer.api.queue_context.EnvShell(value)

Accepted environment shell.

Member Type:

str

Valid values are as follows:

Bash = <EnvShell.Bash: 'bash'>
Zsh = <EnvShell.Zsh: 'zsh'>
pydantic model junifer.api.queue_context.GnuParallelLocalAdapter

Class for generating commands for GNU Parallel (local).

Parameters:
job_namestr

The job name.

job_dirpathlib.Path

The path to the job directory.

yaml_config_pathpathlib.Path

The path to the YAML config file.

elementsElements

Element(s) to process. Will be used to index the DataGrabber.

pre_run_cmdsstr or None, optional

Extra shell commands to source before the run (default None).

pre_collect_cmdsstr or None, optional

Extra shell commands to source before the collect (default None).

envQueueContextEnv or None, optional

The environment configuration. If None, will run without a virtual environment of any kind (default None).

verbosestr, optional

The level of verbosity (default “info”).

verbose_dataladstr or None, optional

The level of verbosity for datalad. If None, will be the same as verbose (default None).

submitbool, optional

Whether to submit the jobs (default False).

See also

QueueContextAdapter

The base class for QueueContext.

HTCondorAdapter

The concrete class for queueing via HTCondor.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema
{
   "title": "GnuParallelLocalAdapter",
   "description": "Class for generating commands for GNU Parallel (local).\n\nParameters\n----------\njob_name : str\n    The job name.\njob_dir : pathlib.Path\n    The path to the job directory.\nyaml_config_path : pathlib.Path\n    The path to the YAML config file.\nelements : ``Elements``\n    Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n    Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n    Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n    The environment configuration. If None, will run without a\n    virtual environment of any kind (default None).\nverbose : str, optional\n    The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n    The level of verbosity for datalad. If None, will be the same\n    as ``verbose`` (default None).\nsubmit : bool, optional\n    Whether to submit the jobs (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n    The base class for QueueContext.\nHTCondorAdapter :\n    The concrete class for queueing via HTCondor.",
   "type": "object",
   "properties": {
      "job_name": {
         "title": "Job Name",
         "type": "string"
      },
      "job_dir": {
         "format": "path",
         "title": "Job Dir",
         "type": "string"
      },
      "yaml_config_path": {
         "format": "path",
         "title": "Yaml Config Path",
         "type": "string"
      },
      "elements": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               }
            ]
         },
         "title": "Elements",
         "type": "array"
      },
      "pre_run_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Run Cmds"
      },
      "pre_collect_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Collect Cmds"
      },
      "env": {
         "anyOf": [
            {
               "$ref": "#/$defs/QueueContextEnv"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "verbose": {
         "default": "info",
         "title": "Verbose",
         "type": "string"
      },
      "verbose_datalad": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Verbose Datalad"
      },
      "submit": {
         "default": false,
         "title": "Submit",
         "type": "boolean"
      }
   },
   "$defs": {
      "EnvKind": {
         "description": "Accepted Python environment kind.",
         "enum": [
            "venv",
            "conda",
            "local"
         ],
         "title": "EnvKind",
         "type": "string"
      },
      "EnvShell": {
         "description": "Accepted environment shell.",
         "enum": [
            "bash",
            "zsh"
         ],
         "title": "EnvShell",
         "type": "string"
      },
      "QueueContextEnv": {
         "additionalProperties": true,
         "description": "Accepted environment configuration for queue context.",
         "properties": {
            "kind": {
               "$ref": "#/$defs/EnvKind"
            },
            "name": {
               "title": "Name",
               "type": "string"
            },
            "shell": {
               "$ref": "#/$defs/EnvShell"
            }
         },
         "required": [
            "kind",
            "shell"
         ],
         "title": "QueueContextEnv",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "job_name",
      "job_dir",
      "yaml_config_path",
      "elements"
   ]
}

Config:
  • use_enum_values: bool = True

  • extra: str = allow

Fields:
  • elements (collections.abc.Sequence[str | tuple[str, ...]])

  • env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)

  • job_dir (pathlib.Path)

  • job_name (str)

  • pre_collect_cmds (str | None)

  • pre_run_cmds (str | None)

  • submit (bool)

  • verbose (str)

  • verbose_datalad (str | None)

  • yaml_config_path (pathlib.Path)

field elements: Sequence[str | tuple[str, ...]] [Required]
field env: QueueContextEnv | None = None
field job_dir: Path [Required]
field job_name: str [Required]
field pre_collect_cmds: str | None = None
field pre_run_cmds: str | None = None
field submit: bool = False
field verbose: str = 'info'
field verbose_datalad: str | None = None
field yaml_config_path: Path [Required]
collect()

Return collect commands.

elements_to_run()

Return elements to run.

model_post_init(context)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

pre_collect()

Return pre-collect commands.

pre_run()

Return pre-run commands.

prepare()

Prepare assets for submission.

run()

Return run commands.

pydantic model junifer.api.queue_context.HTCondorAdapter

Class for generating queueing scripts for HTCondor.

Parameters:
job_namestr

The job name to be used by HTCondor.

job_dirpathlib.Path

The path to the job directory.

yaml_config_pathpathlib.Path

The path to the YAML config file.

elementsElements

Element(s) to process. Will be used to index the DataGrabber.

pre_run_cmdsstr or None, optional

Extra shell commands to source before the run (default None).

pre_collect_cmdsstr or None, optional

Extra shell commands to source before the collect (default None).

envQueueContextEnv or None, optional

The environment configuration. If None, will run without a virtual environment of any kind (default None).

verbosestr, optional

The level of verbosity (default “info”).

verbose_dataladstr or None, optional

The level of verbosity for datalad. If None, will be the same as verbose (default None).

cpusint, optional

The number of CPU cores to use (default 1).

memstr, optional

The size of memory (RAM) to use (default “8G”).

diskstr, optional

The size of disk (HDD or SSD) to use (default “1G”).

extra_preamblestr or None, optional

Extra commands to pass to HTCondor (default None).

collect_taskHTCondorCollect, optional

Whether to submit “collect” task for junifer (default “yes”).

submitbool, optional

Whether to submit the jobs. In any case, .dag files will be created for submission (default False).

See also

QueueContextAdapter

The base class for QueueContext.

GnuParallelLocalAdapter

The concrete class for queueing via GNU Parallel (local).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema
{
   "title": "HTCondorAdapter",
   "description": "Class for generating queueing scripts for HTCondor.\n\nParameters\n----------\njob_name : str\n    The job name to be used by HTCondor.\njob_dir : pathlib.Path\n    The path to the job directory.\nyaml_config_path : pathlib.Path\n    The path to the YAML config file.\nelements : ``Elements``\n    Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n    Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n    Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n    The environment configuration. If None, will run without a\n    virtual environment of any kind (default None).\nverbose : str, optional\n    The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n    The level of verbosity for datalad. If None, will be the same\n    as ``verbose`` (default None).\ncpus : int, optional\n    The number of CPU cores to use (default 1).\nmem : str, optional\n    The size of memory (RAM) to use (default \"8G\").\ndisk : str, optional\n    The size of disk (HDD or SSD) to use (default \"1G\").\nextra_preamble : str or None, optional\n    Extra commands to pass to HTCondor (default None).\ncollect_task : :class:`.HTCondorCollect`, optional\n    Whether to submit \"collect\" task for junifer (default \"yes\").\nsubmit : bool, optional\n    Whether to submit the jobs. In any case, .dag files will be created\n    for submission (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n    The base class for QueueContext.\nGnuParallelLocalAdapter :\n    The concrete class for queueing via GNU Parallel (local).",
   "type": "object",
   "properties": {
      "job_name": {
         "title": "Job Name",
         "type": "string"
      },
      "job_dir": {
         "format": "path",
         "title": "Job Dir",
         "type": "string"
      },
      "yaml_config_path": {
         "format": "path",
         "title": "Yaml Config Path",
         "type": "string"
      },
      "elements": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               }
            ]
         },
         "title": "Elements",
         "type": "array"
      },
      "pre_run_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Run Cmds"
      },
      "pre_collect_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Collect Cmds"
      },
      "env": {
         "anyOf": [
            {
               "$ref": "#/$defs/QueueContextEnv"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "verbose": {
         "default": "info",
         "title": "Verbose",
         "type": "string"
      },
      "verbose_datalad": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Verbose Datalad"
      },
      "cpus": {
         "default": 1,
         "title": "Cpus",
         "type": "integer"
      },
      "mem": {
         "default": "8G",
         "title": "Mem",
         "type": "string"
      },
      "disk": {
         "default": "1G",
         "title": "Disk",
         "type": "string"
      },
      "extra_preamble": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Extra Preamble"
      },
      "collect_task": {
         "$ref": "#/$defs/HTCondorCollect",
         "default": "yes"
      },
      "submit": {
         "default": false,
         "title": "Submit",
         "type": "boolean"
      }
   },
   "$defs": {
      "EnvKind": {
         "description": "Accepted Python environment kind.",
         "enum": [
            "venv",
            "conda",
            "local"
         ],
         "title": "EnvKind",
         "type": "string"
      },
      "EnvShell": {
         "description": "Accepted environment shell.",
         "enum": [
            "bash",
            "zsh"
         ],
         "title": "EnvShell",
         "type": "string"
      },
      "HTCondorCollect": {
         "description": "Accepted HTCondor collect commands.\n\n* ``\"yes\"``: Submit \"collect\" task and run even if some of the jobs\n    fail.\n* ``\"on_success_only\"``: Submit \"collect\" task and run only if all jobs\n    succeed.\n* ``\"no\"``: Do not submit \"collect\" task.",
         "enum": [
            "yes",
            "no",
            "on_success_only"
         ],
         "title": "HTCondorCollect",
         "type": "string"
      },
      "QueueContextEnv": {
         "additionalProperties": true,
         "description": "Accepted environment configuration for queue context.",
         "properties": {
            "kind": {
               "$ref": "#/$defs/EnvKind"
            },
            "name": {
               "title": "Name",
               "type": "string"
            },
            "shell": {
               "$ref": "#/$defs/EnvShell"
            }
         },
         "required": [
            "kind",
            "shell"
         ],
         "title": "QueueContextEnv",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "job_name",
      "job_dir",
      "yaml_config_path",
      "elements"
   ]
}

Config:
  • use_enum_values: bool = True

  • extra: str = allow

Fields:
  • collect_task (junifer.api.queue_context.htcondor_adapter.HTCondorCollect)

  • cpus (int)

  • disk (str)

  • elements (collections.abc.Sequence[str | tuple[str, ...]])

  • env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)

  • extra_preamble (str | None)

  • job_dir (pathlib.Path)

  • job_name (str)

  • mem (str)

  • pre_collect_cmds (str | None)

  • pre_run_cmds (str | None)

  • submit (bool)

  • verbose (str)

  • verbose_datalad (str | None)

  • yaml_config_path (pathlib.Path)

field collect_task: HTCondorCollect = HTCondorCollect.Yes
field cpus: int = 1
field disk: str = '1G'
field elements: Sequence[str | tuple[str, ...]] [Required]
field env: QueueContextEnv | None = None
field extra_preamble: str | None = None
field job_dir: Path [Required]
field job_name: str [Required]
field mem: str = '8G'
field pre_collect_cmds: str | None = None
field pre_run_cmds: str | None = None
field submit: bool = False
field verbose: str = 'info'
field verbose_datalad: str | None = None
field yaml_config_path: Path [Required]
collect()

Return collect commands.

dag()

Return HTCondor DAG commands.

model_post_init(context)

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

pre_collect()

Return pre-collect commands.

pre_run()

Return pre-run commands.

prepare()

Prepare assets for submission.

run()

Return run commands.

enum junifer.api.queue_context.HTCondorCollect(value)

Accepted HTCondor collect commands.

  • "yes": Submit “collect” task and run even if some of the jobs

    fail.

  • "on_success_only": Submit “collect” task and run only if all jobs

    succeed.

  • "no": Do not submit “collect” task.

Member Type:

str

Valid values are as follows:

Yes = <HTCondorCollect.Yes: 'yes'>
No = <HTCondorCollect.No: 'no'>
OnSuccessOnly = <HTCondorCollect.OnSuccessOnly: 'on_success_only'>
pydantic model junifer.api.queue_context.QueueContextAdapter

Abstract base class for queue context adapter.

For every queue context, one needs to provide a concrete implementation of this abstract class.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema
{
   "title": "QueueContextAdapter",
   "description": "Abstract base class for queue context adapter.\n\nFor every queue context, one needs to provide a concrete\nimplementation of this abstract class.",
   "type": "object",
   "properties": {},
   "additionalProperties": true
}

Config:
  • use_enum_values: bool = True

  • extra: str = allow

abstract collect()

Return collect commands.

abstract pre_collect()

Return pre-collect commands.

abstract pre_run()

Return pre-run commands.

abstract prepare()

Prepare assets for submission.

abstract run()

Return run commands.

class junifer.api.queue_context.QueueContextEnv

Accepted environment configuration for queue context.