9.2.3. Pipeline¶
Pipeline components.
- class junifer.pipeline.MarkerCollection(markers, datareader=None, preprocessors=None, storage=None)¶
Class for marker collection.
- Parameters:
- Raises:
ValueErrorIf
markershave same names.
- fit(input)¶
Fit the pipeline.
- Parameters:
- input
dict The input data to fit the pipeline on. Should be the output of indexing the Data Grabber with one element.
- input
- Returns:
- validate(datagrabber)¶
Validate the pipeline.
Without doing any computation, check if the marker collection can be fitted without problems i.e., the data required for each marker is present and streamed down the steps. Also, if a storage is configured, check that the storage can handle the markers’ output.
- Parameters:
- datagrabberDataGrabber-like
The DataGrabber to validate.
- class junifer.pipeline.PipelineComponentRegistry(*args, **kwargs)¶
Class for pipeline component registry.
This class is a singleton and is used for managing pipeline components. It serves as a centralized registry for built-in and third-party pipeline components like datagrabbers, datareaders, preprocessors, markers and storage.
- Attributes:
stepslistofstrGet valid pipeline steps.
componentsdictGet registered components for valid pipeline steps.
Initialize the class.
- build_component_instance(step, name, baseclass, init_params=None)¶
Build an instance of class registered as
name.- Parameters:
- Returns:
- object
An instance of the class registered as
nameunderstep.
- Raises:
RuntimeErrorIf there is a problem creating the instance.
ValueErrorIf the created object with the given name is not a subclass of the base class
baseclass.
- property components: Mapping[str, Mapping[str, str | type]]¶
Get registered components for valid pipeline steps.
- deregister(step, klass)¶
De-register
klassunderstep.- Parameters:
- Raises:
ValueErrorIf the
stepis invalid.
- get_class(step, name)¶
Get the class registered under
nameforstep.- Parameters:
- Returns:
- class
Registered class.
- Raises:
ValueErrorIf the
stepornameis invalid.
- register(step, klass)¶
Register
klassunderstep.- Parameters:
- Raises:
ValueErrorIf the
stepis invalid.
- step_components(step)¶
Get registered components for
step.- Parameters:
- step
str Name of the pipeline step.
- step
- Returns:
- Raises:
ValueErrorIf the
stepis invalid.
- class junifer.pipeline.PipelineStepMixin¶
Mixin class for a pipeline step.
- fit_transform(input, **kwargs)¶
Fit and transform.
- get_output_type(input_type)¶
Get output type.
- validate(input)¶
Validate the the pipeline step.
- validate_input(input)¶
Validate the input to the pipeline step.
- Parameters:
- Returns:
- Raises:
ValueErrorIf the input does not have the required data.
- class junifer.pipeline.UpdateMetaMixin¶
Mixin class for updating meta.
- class junifer.pipeline.WorkDirManager(*args, **kwargs)¶
Class for working directory manager.
This class is a singleton and is used for managing temporary and working directories used across the pipeline by datagrabbers, preprocessors, markers and so on. It maintains a single super-directory and provides directories on-demand and cleans after itself thus keeping the user filesystem clean.
- Parameters:
- workdir
strorpathlib.Path, optional The path to the super-directory. If None, “TMPDIR/junifer” is used where TMPDIR is the platform-dependent temporary directory.
- cleanupbool, optional
If False, the directories are not cleaned up after the object is destroyed. This is useful for debugging purposes (default True).
- workdir
- Attributes:
workdirpathlib.PathGet working directory.
elementdirpathlib.PathGet element directory.
root_tempdirpathlib.PathorNoneGet root temporary directory.
Initialize the class.
- cleanup_elementdir()¶
Clean up element directory.
It should preferably be used after fitting a marker or something similar in the element-specific scope. If called between components, can lead to required intermediate files not being found.
- delete_element_tempdir(tempdir)¶
Delete an element-scoped temporary directory.
- Parameters:
- tempdir
pathlib.Path The temporary directory path to be deleted.
- tempdir
- delete_tempdir(tempdir)¶
Delete a component-scoped temporary directory.
- Parameters:
- tempdir
pathlib.Path The temporary directory path to be deleted.
- tempdir
- get_element_tempdir(prefix=None, suffix=None)¶
Get an element-scoped temporary directory.
This directory should be available only for the lifetime of an element.
- Parameters:
- Returns:
pathlib.PathThe path to the temporary directory.
- get_tempdir(prefix=None, suffix=None)¶
Get a component-scoped temporary directory.
This directory should be available only for the lifetime of a component like a preprocessor or marker.
- Parameters:
- Returns:
pathlib.PathThe path to the temporary directory.