9.2.4. API Functions¶

Main Functions¶

Provide API functions.

junifer.api.functions.collect(storage)¶

Collect and store data.

Parameters:

storagedict: Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage constructor.

junifer.api.functions.list_elements(datagrabber, elements=None)¶

List elements of the datagrabber filtered using elements.

Parameters:

datagrabberdict: DataGrabber to index. Must have a key kind with the kind of DataGrabber to use. All other keys are passed to the DataGrabber constructor.
elementslist or None, optional: Element(s) to filter using. Will be used to index the DataGrabber (default None).

junifer.api.functions.queue(config, kind, jobname='junifer_job', overwrite=False, elements=None, **kwargs)¶

Queue a job to be executed later.

Parameters:

configdict: The configuration to be used for queueing the job.
kind{“HTCondor”, “GNUParallelLocal”}: The kind of job queue system to use.
jobnamestr, optional: The name of the job (default “junifer_job”).
overwritebool, optional: Whether to overwrite if job directory already exists (default False).
elementslist or None, optional: Element(s) to process. Will be used to index the DataGrabber (default None).
**kwargsdict: The keyword arguments to pass to the job queue system.

Raises:

ValueError: If kind is invalid or if the jobdir exists and overwrite = False.

junifer.api.functions.reset(config)¶

Reset the storage and jobs directory.

Parameters:

configdict: The configuration to be used for resetting.

junifer.api.functions.run(workdir, datagrabber, markers, storage, preprocessors=None, elements=None)¶

Run the pipeline on the selected element.

Parameters:

workdirstr or pathlib.Path or dict: Directory where the pipeline will be executed.
datagrabberdict: DataGrabber to use. Must have a key kind with the kind of DataGrabber to use. All other keys are passed to the DataGrabber constructor.
markerslist of dict: List of markers to extract. Each marker is a dict with at least two keys: name and kind. The name key is used to name the output marker. The kind key is used to specify the kind of marker to extract. The rest of the keys are used to pass parameters to the marker calculation.
storagedict: Storage to use. Must have a key kind with the kind of storage to use. All other keys are passed to the storage constructor.
preprocessorslist of dict or None, optional: List of preprocessors to use. Each preprocessor is a dict with at least a key kind specifying the preprocessor to use. All other keys are passed to the preprocessor constructor (default None).
elementslist or None, optional: Element(s) to process. Will be used to index the DataGrabber (default None).

Raises:

ValueError: If workdir.cleanup=False when len(elements) > 1.
RuntimeError: If invalid element selectors are found.

Decorators¶

Provide API decorators.

junifer.api.decorators.register_data_dump_asset(types, exts)¶

Asset registration decorator.

Registers the data dump asset for types with exts.

Parameters:

typeslist of class: The classes to dump.
extslist of str: The extensions to load.

Returns:

class: The unmodified input class.

junifer.api.decorators.register_data_registry(name)¶

Registry registration decorator.

Registers the data registry as name.

Parameters:

namestr: The name of the data registry.

Returns:

class: The unmodified input class.

junifer.api.decorators.register_datagrabber(klass)¶

Registers the DataGrabber so it can be used by name.

Parameters:

klassclass: The class of the DataGrabber to register.

Returns:

class: The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_datareader(klass)¶

Registers the DataReader so it can be used by name.

Parameters:

klassclass: The class of the DataReader to register.

Returns:

class: The unmodified input class.

Notes

It should only be used as a decorator.

junifer.api.decorators.register_marker(klass)¶

Marker registration decorator.

Registers the marker so it can be used by name.

Parameters:

klassclass: The class of the marker to register.

Returns:

class: The unmodified input class.

junifer.api.decorators.register_preprocessor(klass)¶

Preprocessor registration decorator.

Registers the preprocessor so it can be used by name.

Parameters:

klassclass: The class of the preprocessor to register.

Returns:

class: The unmodified input class.

junifer.api.decorators.register_storage(klass)¶

Storage registration decorator.

Registers the storage so it can be used by name.

Parameters:

klassclass: The class of the storage to register.

Returns:

class: The unmodified input class.

Queue Context¶

Context adapters for queueing.

enum junifer.api.queue_context.EnvKind(value)¶

Accepted Python environment kind.

Member Type:: str

Valid values are as follows:

Venv = <EnvKind.Venv: 'venv'>¶

Conda = <EnvKind.Conda: 'conda'>¶

Local = <EnvKind.Local: 'local'>¶

enum junifer.api.queue_context.EnvShell(value)¶

Accepted environment shell.

Member Type:: str

Valid values are as follows:

Bash = <EnvShell.Bash: 'bash'>¶

Zsh = <EnvShell.Zsh: 'zsh'>¶

pydantic model junifer.api.queue_context.GnuParallelLocalAdapter¶

Class for generating commands for GNU Parallel (local).

Parameters:

job_namestr: The job name.
job_dirpathlib.Path: The path to the job directory.
yaml_config_pathpathlib.Path: The path to the YAML config file.
elementsElements: Element(s) to process. Will be used to index the DataGrabber.
pre_run_cmdsstr or None, optional: Extra shell commands to source before the run (default None).
pre_collect_cmdsstr or None, optional: Extra shell commands to source before the collect (default None).
envQueueContextEnv or None, optional: The environment configuration. If None, will run without a virtual environment of any kind (default None).
verbosestr, optional: The level of verbosity (default “info”).
verbose_dataladstr or None, optional: The level of verbosity for datalad. If None, will be the same as verbose (default None).
submitbool, optional: Whether to submit the jobs (default False).

See also

QueueContextAdapter: The base class for QueueContext.
HTCondorAdapter: The concrete class for queueing via HTCondor.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema

{
   "title": "GnuParallelLocalAdapter",
   "description": "Class for generating commands for GNU Parallel (local).\n\nParameters\n----------\njob_name : str\n    The job name.\njob_dir : pathlib.Path\n    The path to the job directory.\nyaml_config_path : pathlib.Path\n    The path to the YAML config file.\nelements : ``Elements``\n    Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n    Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n    Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n    The environment configuration. If None, will run without a\n    virtual environment of any kind (default None).\nverbose : str, optional\n    The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n    The level of verbosity for datalad. If None, will be the same\n    as ``verbose`` (default None).\nsubmit : bool, optional\n    Whether to submit the jobs (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n    The base class for QueueContext.\nHTCondorAdapter :\n    The concrete class for queueing via HTCondor.",
   "type": "object",
   "properties": {
      "job_name": {
         "title": "Job Name",
         "type": "string"
      },
      "job_dir": {
         "format": "path",
         "title": "Job Dir",
         "type": "string"
      },
      "yaml_config_path": {
         "format": "path",
         "title": "Yaml Config Path",
         "type": "string"
      },
      "elements": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               }
            ]
         },
         "title": "Elements",
         "type": "array"
      },
      "pre_run_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Run Cmds"
      },
      "pre_collect_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Collect Cmds"
      },
      "env": {
         "anyOf": [
            {
               "$ref": "#/$defs/QueueContextEnv"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "verbose": {
         "default": "info",
         "title": "Verbose",
         "type": "string"
      },
      "verbose_datalad": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Verbose Datalad"
      },
      "submit": {
         "default": false,
         "title": "Submit",
         "type": "boolean"
      }
   },
   "$defs": {
      "EnvKind": {
         "description": "Accepted Python environment kind.",
         "enum": [
            "venv",
            "conda",
            "local"
         ],
         "title": "EnvKind",
         "type": "string"
      },
      "EnvShell": {
         "description": "Accepted environment shell.",
         "enum": [
            "bash",
            "zsh"
         ],
         "title": "EnvShell",
         "type": "string"
      },
      "QueueContextEnv": {
         "additionalProperties": true,
         "description": "Accepted environment configuration for queue context.",
         "properties": {
            "kind": {
               "$ref": "#/$defs/EnvKind"
            },
            "name": {
               "title": "Name",
               "type": "string"
            },
            "shell": {
               "$ref": "#/$defs/EnvShell"
            }
         },
         "required": [
            "kind",
            "shell"
         ],
         "title": "QueueContextEnv",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "job_name",
      "job_dir",
      "yaml_config_path",
      "elements"
   ]
}

Config:

use_enum_values: bool = True
extra: str = allow

Fields:

elements (collections.abc.Sequence[str | tuple[str, ...]])
env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)
job_dir (pathlib.Path)
job_name (str)
pre_collect_cmds (str | None)
pre_run_cmds (str | None)
submit (bool)
verbose (str)
verbose_datalad (str | None)
yaml_config_path (pathlib.Path)

field elements: Sequence[str | tuple[str, ...]] [Required]¶

field env: QueueContextEnv | None = None¶

field job_dir: Path [Required]¶

field job_name: str [Required]¶

field pre_collect_cmds: str | None = None¶

field pre_run_cmds: str | None = None¶

field submit: bool = False¶

field verbose: str = 'info'¶

field verbose_datalad: str | None = None¶

field yaml_config_path: Path [Required]¶

collect()¶

Return collect commands.

elements_to_run()¶

Return elements to run.

model_post_init(context)¶

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

pre_collect()¶

Return pre-collect commands.

pre_run()¶

Return pre-run commands.

prepare()¶

Prepare assets for submission.

run()¶

Return run commands.

pydantic model junifer.api.queue_context.HTCondorAdapter¶

Class for generating queueing scripts for HTCondor.

Parameters:

job_namestr: The job name to be used by HTCondor.
job_dirpathlib.Path: The path to the job directory.
yaml_config_pathpathlib.Path: The path to the YAML config file.
elementsElements: Element(s) to process. Will be used to index the DataGrabber.
pre_run_cmdsstr or None, optional: Extra shell commands to source before the run (default None).
pre_collect_cmdsstr or None, optional: Extra shell commands to source before the collect (default None).
envQueueContextEnv or None, optional: The environment configuration. If None, will run without a virtual environment of any kind (default None).
verbosestr, optional: The level of verbosity (default “info”).
verbose_dataladstr or None, optional: The level of verbosity for datalad. If None, will be the same as verbose (default None).
cpusint, optional: The number of CPU cores to use (default 1).
memstr, optional: The size of memory (RAM) to use (default “8G”).
diskstr, optional: The size of disk (HDD or SSD) to use (default “1G”).
extra_preamblestr or None, optional: Extra commands to pass to HTCondor (default None).
collect_taskHTCondorCollect, optional: Whether to submit “collect” task for junifer (default “yes”).
submitbool, optional: Whether to submit the jobs. In any case, .dag files will be created for submission (default False).

See also

QueueContextAdapter: The base class for QueueContext.
GnuParallelLocalAdapter: The concrete class for queueing via GNU Parallel (local).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema

{
   "title": "HTCondorAdapter",
   "description": "Class for generating queueing scripts for HTCondor.\n\nParameters\n----------\njob_name : str\n    The job name to be used by HTCondor.\njob_dir : pathlib.Path\n    The path to the job directory.\nyaml_config_path : pathlib.Path\n    The path to the YAML config file.\nelements : ``Elements``\n    Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n    Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n    Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n    The environment configuration. If None, will run without a\n    virtual environment of any kind (default None).\nverbose : str, optional\n    The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n    The level of verbosity for datalad. If None, will be the same\n    as ``verbose`` (default None).\ncpus : int, optional\n    The number of CPU cores to use (default 1).\nmem : str, optional\n    The size of memory (RAM) to use (default \"8G\").\ndisk : str, optional\n    The size of disk (HDD or SSD) to use (default \"1G\").\nextra_preamble : str or None, optional\n    Extra commands to pass to HTCondor (default None).\ncollect_task : :class:`.HTCondorCollect`, optional\n    Whether to submit \"collect\" task for junifer (default \"yes\").\nsubmit : bool, optional\n    Whether to submit the jobs. In any case, .dag files will be created\n    for submission (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n    The base class for QueueContext.\nGnuParallelLocalAdapter :\n    The concrete class for queueing via GNU Parallel (local).",
   "type": "object",
   "properties": {
      "job_name": {
         "title": "Job Name",
         "type": "string"
      },
      "job_dir": {
         "format": "path",
         "title": "Job Dir",
         "type": "string"
      },
      "yaml_config_path": {
         "format": "path",
         "title": "Yaml Config Path",
         "type": "string"
      },
      "elements": {
         "items": {
            "anyOf": [
               {
                  "type": "string"
               },
               {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               }
            ]
         },
         "title": "Elements",
         "type": "array"
      },
      "pre_run_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Run Cmds"
      },
      "pre_collect_cmds": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pre Collect Cmds"
      },
      "env": {
         "anyOf": [
            {
               "$ref": "#/$defs/QueueContextEnv"
            },
            {
               "type": "null"
            }
         ],
         "default": null
      },
      "verbose": {
         "default": "info",
         "title": "Verbose",
         "type": "string"
      },
      "verbose_datalad": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Verbose Datalad"
      },
      "cpus": {
         "default": 1,
         "title": "Cpus",
         "type": "integer"
      },
      "mem": {
         "default": "8G",
         "title": "Mem",
         "type": "string"
      },
      "disk": {
         "default": "1G",
         "title": "Disk",
         "type": "string"
      },
      "extra_preamble": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Extra Preamble"
      },
      "collect_task": {
         "$ref": "#/$defs/HTCondorCollect",
         "default": "yes"
      },
      "submit": {
         "default": false,
         "title": "Submit",
         "type": "boolean"
      }
   },
   "$defs": {
      "EnvKind": {
         "description": "Accepted Python environment kind.",
         "enum": [
            "venv",
            "conda",
            "local"
         ],
         "title": "EnvKind",
         "type": "string"
      },
      "EnvShell": {
         "description": "Accepted environment shell.",
         "enum": [
            "bash",
            "zsh"
         ],
         "title": "EnvShell",
         "type": "string"
      },
      "HTCondorCollect": {
         "description": "Accepted HTCondor collect commands.\n\n* ``\"yes\"``: Submit \"collect\" task and run even if some of the jobs\n    fail.\n* ``\"on_success_only\"``: Submit \"collect\" task and run only if all jobs\n    succeed.\n* ``\"no\"``: Do not submit \"collect\" task.",
         "enum": [
            "yes",
            "no",
            "on_success_only"
         ],
         "title": "HTCondorCollect",
         "type": "string"
      },
      "QueueContextEnv": {
         "additionalProperties": true,
         "description": "Accepted environment configuration for queue context.",
         "properties": {
            "kind": {
               "$ref": "#/$defs/EnvKind"
            },
            "name": {
               "title": "Name",
               "type": "string"
            },
            "shell": {
               "$ref": "#/$defs/EnvShell"
            }
         },
         "required": [
            "kind",
            "shell"
         ],
         "title": "QueueContextEnv",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "job_name",
      "job_dir",
      "yaml_config_path",
      "elements"
   ]
}

Config:

use_enum_values: bool = True
extra: str = allow

Fields:

collect_task (junifer.api.queue_context.htcondor_adapter.HTCondorCollect)
cpus (int)
disk (str)
elements (collections.abc.Sequence[str | tuple[str, ...]])
env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)
extra_preamble (str | None)
job_dir (pathlib.Path)
job_name (str)
mem (str)
pre_collect_cmds (str | None)
pre_run_cmds (str | None)
submit (bool)
verbose (str)
verbose_datalad (str | None)
yaml_config_path (pathlib.Path)

field collect_task: HTCondorCollect = HTCondorCollect.Yes¶

field cpus: int = 1¶

field disk: str = '1G'¶

field elements: Sequence[str | tuple[str, ...]] [Required]¶

field env: QueueContextEnv | None = None¶

field extra_preamble: str | None = None¶

field job_dir: Path [Required]¶

field job_name: str [Required]¶

field mem: str = '8G'¶

field pre_collect_cmds: str | None = None¶

field pre_run_cmds: str | None = None¶

field submit: bool = False¶

field verbose: str = 'info'¶

field verbose_datalad: str | None = None¶

field yaml_config_path: Path [Required]¶

collect()¶

Return collect commands.

dag()¶

Return HTCondor DAG commands.

model_post_init(context)¶

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

pre_collect()¶

Return pre-collect commands.

pre_run()¶

Return pre-run commands.

prepare()¶

Prepare assets for submission.

run()¶

Return run commands.

enum junifer.api.queue_context.HTCondorCollect(value)¶

Accepted HTCondor collect commands.

"yes": Submit “collect” task and run even if some of the jobs
fail.
"on_success_only": Submit “collect” task and run only if all jobs
succeed.
"no": Do not submit “collect” task.

Member Type:: str

Valid values are as follows:

Yes = <HTCondorCollect.Yes: 'yes'>¶

No = <HTCondorCollect.No: 'no'>¶

OnSuccessOnly = <HTCondorCollect.OnSuccessOnly: 'on_success_only'>¶

pydantic model junifer.api.queue_context.QueueContextAdapter¶

Abstract base class for queue context adapter.

For every queue context, one needs to provide a concrete implementation of this abstract class.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Show JSON schema

{
   "title": "QueueContextAdapter",
   "description": "Abstract base class for queue context adapter.\n\nFor every queue context, one needs to provide a concrete\nimplementation of this abstract class.",
   "type": "object",
   "properties": {},
   "additionalProperties": true
}

Config:

use_enum_values: bool = True
extra: str = allow

abstract collect()¶

Return collect commands.

abstract pre_collect()¶

Return pre-collect commands.

abstract pre_run()¶

Return pre-run commands.

abstract prepare()¶

Prepare assets for submission.

abstract run()¶

Return run commands.

class junifer.api.queue_context.QueueContextEnv¶: Accepted environment configuration for queue context.