9.2.4. API Functions¶
Main Functions¶
Provide API functions.
- junifer.api.functions.collect(storage)¶
Collect and store data.
- Parameters:
- storage
dict Storage to use. Must have a key
kindwith the kind of storage to use. All other keys are passed to the storage constructor.
- storage
- junifer.api.functions.list_elements(datagrabber, elements=None)¶
List elements of the datagrabber filtered using elements.
- junifer.api.functions.queue(config, kind, jobname='junifer_job', overwrite=False, elements=None, **kwargs)¶
Queue a job to be executed later.
- Parameters:
- config
dict The configuration to be used for queueing the job.
- kind{“HTCondor”, “GNUParallelLocal”}
The kind of job queue system to use.
- jobname
str, optional The name of the job (default “junifer_job”).
- overwritebool, optional
Whether to overwrite if job directory already exists (default False).
- elements
listorNone, optional Element(s) to process. Will be used to index the DataGrabber (default None).
- **kwargs
dict The keyword arguments to pass to the job queue system.
- config
- Raises:
ValueErrorIf
kindis invalid or if thejobdirexists andoverwrite = False.
- junifer.api.functions.reset(config)¶
Reset the storage and jobs directory.
- Parameters:
- config
dict The configuration to be used for resetting.
- config
- junifer.api.functions.run(workdir, datagrabber, markers, storage, preprocessors=None, elements=None)¶
Run the pipeline on the selected element.
- Parameters:
- workdir
strorpathlib.Pathordict Directory where the pipeline will be executed.
- datagrabber
dict DataGrabber to use. Must have a key
kindwith the kind of DataGrabber to use. All other keys are passed to the DataGrabber constructor.- markers
listofdict List of markers to extract. Each marker is a dict with at least two keys:
nameandkind. Thenamekey is used to name the output marker. Thekindkey is used to specify the kind of marker to extract. The rest of the keys are used to pass parameters to the marker calculation.- storage
dict Storage to use. Must have a key
kindwith the kind of storage to use. All other keys are passed to the storage constructor.- preprocessors
listofdictorNone, optional List of preprocessors to use. Each preprocessor is a dict with at least a key
kindspecifying the preprocessor to use. All other keys are passed to the preprocessor constructor (default None).- elements
listorNone, optional Element(s) to process. Will be used to index the DataGrabber (default None).
- workdir
- Raises:
ValueErrorIf
workdir.cleanup=Falsewhenlen(elements) > 1.RuntimeErrorIf invalid element selectors are found.
Decorators¶
Provide API decorators.
- junifer.api.decorators.register_data_dump_asset(types, exts)¶
Asset registration decorator.
Registers the data dump asset for
typeswithexts.
- junifer.api.decorators.register_data_registry(name)¶
Registry registration decorator.
Registers the data registry as
name.- Parameters:
- name
str The name of the data registry.
- name
- Returns:
- class
The unmodified input class.
- junifer.api.decorators.register_datagrabber(klass)¶
Register DataGrabber.
Registers the DataGrabber so it can be used by name.
- Parameters:
- klassclass
The class of the DataGrabber to register.
- Returns:
- class
The unmodified input class.
Notes
It should only be used as a decorator.
- junifer.api.decorators.register_datareader(klass)¶
Register DataReader.
Registers the DataReader so it can be used by name.
- Parameters:
- klassclass
The class of the DataReader to register.
- Returns:
- class
The unmodified input class.
Notes
It should only be used as a decorator.
- junifer.api.decorators.register_marker(klass)¶
Marker registration decorator.
Registers the marker so it can be used by name.
- Parameters:
- klassclass
The class of the marker to register.
- Returns:
- class
The unmodified input class.
- junifer.api.decorators.register_preprocessor(klass)¶
Preprocessor registration decorator.
Registers the preprocessor so it can be used by name.
- Parameters:
- klassclass
The class of the preprocessor to register.
- Returns:
- class
The unmodified input class.
- junifer.api.decorators.register_storage(klass)¶
Storage registration decorator.
Registers the storage so it can be used by name.
- Parameters:
- klassclass
The class of the storage to register.
- Returns:
- class
The unmodified input class.
Queue Context¶
Context adapters for queueing.
- enum junifer.api.queue_context.EnvKind(value)¶
Accepted Python environment kind.
- Member Type:
Valid values are as follows:
- Venv = <EnvKind.Venv: 'venv'>¶
- Conda = <EnvKind.Conda: 'conda'>¶
- Local = <EnvKind.Local: 'local'>¶
- enum junifer.api.queue_context.EnvShell(value)¶
Accepted environment shell.
- Member Type:
Valid values are as follows:
- Bash = <EnvShell.Bash: 'bash'>¶
- Zsh = <EnvShell.Zsh: 'zsh'>¶
- pydantic model junifer.api.queue_context.GnuParallelLocalAdapter¶
Class for generating commands for GNU Parallel (local).
- Parameters:
- job_name
str The job name.
- job_dir
pathlib.Path The path to the job directory.
- yaml_config_path
pathlib.Path The path to the YAML config file.
- elements
Elements Element(s) to process. Will be used to index the DataGrabber.
- pre_run_cmds
strorNone, optional Extra shell commands to source before the run (default None).
- pre_collect_cmds
strorNone, optional Extra shell commands to source before the collect (default None).
- env
QueueContextEnvorNone, optional The environment configuration. If None, will run without a virtual environment of any kind (default None).
- verbose
str, optional The level of verbosity (default “info”).
- verbose_datalad
strorNone, optional The level of verbosity for datalad. If None, will be the same as
verbose(default None).- submitbool, optional
Whether to submit the jobs (default False).
- job_name
See also
QueueContextAdapterThe base class for QueueContext.
HTCondorAdapterThe concrete class for queueing via HTCondor.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "GnuParallelLocalAdapter", "description": "Class for generating commands for GNU Parallel (local).\n\nParameters\n----------\njob_name : str\n The job name.\njob_dir : pathlib.Path\n The path to the job directory.\nyaml_config_path : pathlib.Path\n The path to the YAML config file.\nelements : ``Elements``\n Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n The environment configuration. If None, will run without a\n virtual environment of any kind (default None).\nverbose : str, optional\n The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n The level of verbosity for datalad. If None, will be the same\n as ``verbose`` (default None).\nsubmit : bool, optional\n Whether to submit the jobs (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n The base class for QueueContext.\nHTCondorAdapter :\n The concrete class for queueing via HTCondor.", "type": "object", "properties": { "job_name": { "title": "Job Name", "type": "string" }, "job_dir": { "format": "path", "title": "Job Dir", "type": "string" }, "yaml_config_path": { "format": "path", "title": "Yaml Config Path", "type": "string" }, "elements": { "items": { "anyOf": [ { "type": "string" }, { "items": { "type": "string" }, "type": "array" } ] }, "title": "Elements", "type": "array" }, "pre_run_cmds": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Pre Run Cmds" }, "pre_collect_cmds": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Pre Collect Cmds" }, "env": { "anyOf": [ { "$ref": "#/$defs/QueueContextEnv" }, { "type": "null" } ], "default": null }, "verbose": { "default": "info", "title": "Verbose", "type": "string" }, "verbose_datalad": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Verbose Datalad" }, "submit": { "default": false, "title": "Submit", "type": "boolean" } }, "$defs": { "EnvKind": { "description": "Accepted Python environment kind.", "enum": [ "venv", "conda", "local" ], "title": "EnvKind", "type": "string" }, "EnvShell": { "description": "Accepted environment shell.", "enum": [ "bash", "zsh" ], "title": "EnvShell", "type": "string" }, "QueueContextEnv": { "additionalProperties": true, "description": "Accepted environment configuration for queue context.", "properties": { "kind": { "$ref": "#/$defs/EnvKind" }, "name": { "title": "Name", "type": "string" }, "shell": { "$ref": "#/$defs/EnvShell" } }, "required": [ "kind", "shell" ], "title": "QueueContextEnv", "type": "object" } }, "additionalProperties": true, "required": [ "job_name", "job_dir", "yaml_config_path", "elements" ] }
- Config:
use_enum_values: bool = True
extra: str = allow
- Fields:
elements (collections.abc.Sequence[str | tuple[str, ...]])env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)job_dir (pathlib.Path)job_name (str)pre_collect_cmds (str | None)pre_run_cmds (str | None)submit (bool)verbose (str)verbose_datalad (str | None)yaml_config_path (pathlib.Path)
- field env: QueueContextEnv | None = None¶
- collect()¶
Return collect commands.
- elements_to_run()¶
Return elements to run.
- model_post_init(context)¶
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- pre_collect()¶
Return pre-collect commands.
- pre_run()¶
Return pre-run commands.
- prepare()¶
Prepare assets for submission.
- run()¶
Return run commands.
- pydantic model junifer.api.queue_context.HTCondorAdapter¶
Class for generating queueing scripts for HTCondor.
- Parameters:
- job_name
str The job name to be used by HTCondor.
- job_dir
pathlib.Path The path to the job directory.
- yaml_config_path
pathlib.Path The path to the YAML config file.
- elements
Elements Element(s) to process. Will be used to index the DataGrabber.
- pre_run_cmds
strorNone, optional Extra shell commands to source before the run (default None).
- pre_collect_cmds
strorNone, optional Extra shell commands to source before the collect (default None).
- env
QueueContextEnvorNone, optional The environment configuration. If None, will run without a virtual environment of any kind (default None).
- verbose
str, optional The level of verbosity (default “info”).
- verbose_datalad
strorNone, optional The level of verbosity for datalad. If None, will be the same as
verbose(default None).- cpus
int, optional The number of CPU cores to use (default 1).
- mem
str, optional The size of memory (RAM) to use (default “8G”).
- disk
str, optional The size of disk (HDD or SSD) to use (default “1G”).
- extra_preamble
strorNone, optional Extra commands to pass to HTCondor (default None).
- collect_task
HTCondorCollect, optional Whether to submit “collect” task for junifer (default “yes”).
- submitbool, optional
Whether to submit the jobs. In any case, .dag files will be created for submission (default False).
- job_name
See also
QueueContextAdapterThe base class for QueueContext.
GnuParallelLocalAdapterThe concrete class for queueing via GNU Parallel (local).
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "HTCondorAdapter", "description": "Class for generating queueing scripts for HTCondor.\n\nParameters\n----------\njob_name : str\n The job name to be used by HTCondor.\njob_dir : pathlib.Path\n The path to the job directory.\nyaml_config_path : pathlib.Path\n The path to the YAML config file.\nelements : ``Elements``\n Element(s) to process. Will be used to index the DataGrabber.\npre_run_cmds : str or None, optional\n Extra shell commands to source before the run (default None).\npre_collect_cmds : str or None, optional\n Extra shell commands to source before the collect (default None).\nenv : :class:`.QueueContextEnv` or None, optional\n The environment configuration. If None, will run without a\n virtual environment of any kind (default None).\nverbose : str, optional\n The level of verbosity (default \"info\").\nverbose_datalad : str or None, optional\n The level of verbosity for datalad. If None, will be the same\n as ``verbose`` (default None).\ncpus : int, optional\n The number of CPU cores to use (default 1).\nmem : str, optional\n The size of memory (RAM) to use (default \"8G\").\ndisk : str, optional\n The size of disk (HDD or SSD) to use (default \"1G\").\nextra_preamble : str or None, optional\n Extra commands to pass to HTCondor (default None).\ncollect_task : :class:`.HTCondorCollect`, optional\n Whether to submit \"collect\" task for junifer (default \"yes\").\nsubmit : bool, optional\n Whether to submit the jobs. In any case, .dag files will be created\n for submission (default False).\n\nSee Also\n--------\nQueueContextAdapter :\n The base class for QueueContext.\nGnuParallelLocalAdapter :\n The concrete class for queueing via GNU Parallel (local).", "type": "object", "properties": { "job_name": { "title": "Job Name", "type": "string" }, "job_dir": { "format": "path", "title": "Job Dir", "type": "string" }, "yaml_config_path": { "format": "path", "title": "Yaml Config Path", "type": "string" }, "elements": { "items": { "anyOf": [ { "type": "string" }, { "items": { "type": "string" }, "type": "array" } ] }, "title": "Elements", "type": "array" }, "pre_run_cmds": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Pre Run Cmds" }, "pre_collect_cmds": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Pre Collect Cmds" }, "env": { "anyOf": [ { "$ref": "#/$defs/QueueContextEnv" }, { "type": "null" } ], "default": null }, "verbose": { "default": "info", "title": "Verbose", "type": "string" }, "verbose_datalad": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Verbose Datalad" }, "cpus": { "default": 1, "title": "Cpus", "type": "integer" }, "mem": { "default": "8G", "title": "Mem", "type": "string" }, "disk": { "default": "1G", "title": "Disk", "type": "string" }, "extra_preamble": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Extra Preamble" }, "collect_task": { "$ref": "#/$defs/HTCondorCollect", "default": "yes" }, "submit": { "default": false, "title": "Submit", "type": "boolean" } }, "$defs": { "EnvKind": { "description": "Accepted Python environment kind.", "enum": [ "venv", "conda", "local" ], "title": "EnvKind", "type": "string" }, "EnvShell": { "description": "Accepted environment shell.", "enum": [ "bash", "zsh" ], "title": "EnvShell", "type": "string" }, "HTCondorCollect": { "description": "Accepted HTCondor collect commands.\n\n* ``\"yes\"``: Submit \"collect\" task and run even if some of the jobs\n fail.\n* ``\"on_success_only\"``: Submit \"collect\" task and run only if all jobs\n succeed.\n* ``\"no\"``: Do not submit \"collect\" task.", "enum": [ "yes", "no", "on_success_only" ], "title": "HTCondorCollect", "type": "string" }, "QueueContextEnv": { "additionalProperties": true, "description": "Accepted environment configuration for queue context.", "properties": { "kind": { "$ref": "#/$defs/EnvKind" }, "name": { "title": "Name", "type": "string" }, "shell": { "$ref": "#/$defs/EnvShell" } }, "required": [ "kind", "shell" ], "title": "QueueContextEnv", "type": "object" } }, "additionalProperties": true, "required": [ "job_name", "job_dir", "yaml_config_path", "elements" ] }
- Config:
use_enum_values: bool = True
extra: str = allow
- Fields:
collect_task (junifer.api.queue_context.htcondor_adapter.HTCondorCollect)cpus (int)disk (str)elements (collections.abc.Sequence[str | tuple[str, ...]])env (junifer.api.queue_context.queue_context_adapter.QueueContextEnv | None)extra_preamble (str | None)job_dir (pathlib.Path)job_name (str)mem (str)pre_collect_cmds (str | None)pre_run_cmds (str | None)submit (bool)verbose (str)verbose_datalad (str | None)yaml_config_path (pathlib.Path)
- field collect_task: HTCondorCollect = HTCondorCollect.Yes¶
- field env: QueueContextEnv | None = None¶
- collect()¶
Return collect commands.
- dag()¶
Return HTCondor DAG commands.
- model_post_init(context)¶
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- pre_collect()¶
Return pre-collect commands.
- pre_run()¶
Return pre-run commands.
- prepare()¶
Prepare assets for submission.
- run()¶
Return run commands.
- enum junifer.api.queue_context.HTCondorCollect(value)¶
Accepted HTCondor collect commands.
"yes": Submit “collect” task and run even if some of the jobsfail.
"on_success_only": Submit “collect” task and run only if all jobssucceed.
"no": Do not submit “collect” task.
- Member Type:
Valid values are as follows:
- Yes = <HTCondorCollect.Yes: 'yes'>¶
- No = <HTCondorCollect.No: 'no'>¶
- OnSuccessOnly = <HTCondorCollect.OnSuccessOnly: 'on_success_only'>¶
- pydantic model junifer.api.queue_context.QueueContextAdapter¶
Abstract base class for queue context adapter.
For every queue context, one needs to provide a concrete implementation of this abstract class.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "QueueContextAdapter", "description": "Abstract base class for queue context adapter.\n\nFor every queue context, one needs to provide a concrete\nimplementation of this abstract class.", "type": "object", "properties": {}, "additionalProperties": true }
- Config:
use_enum_values: bool = True
extra: str = allow
- abstract collect()¶
Return collect commands.
- abstract pre_collect()¶
Return pre-collect commands.
- abstract pre_run()¶
Return pre-run commands.
- abstract prepare()¶
Prepare assets for submission.
- abstract run()¶
Return run commands.
- class junifer.api.queue_context.QueueContextEnv¶
Accepted environment configuration for queue context.