8.1.5. Storage#

Provide imports for storage sub-package.

class junifer.storage.BaseFeatureStorage(uri, storage_types, single_output=True)#

Abstract base class for feature storage.

For every interface that is required, one needs to provide a concrete implementation of this abstract class.

Parameters:

uristr or pathlib.Path: The path to the storage.
storage_typesstr or list of str: The available storage types for the class.
single_outputbool, optional: Whether to have single output (default True).

abstract collect()#

Collect data.

get_valid_inputs()#

Get valid storage types for input.

Returns:

list of str: The list of storage types that can be used as input for this ” “storage.

abstract list_features()#

List the features in the storage.

Returns:

dict: List of features in the storage. The keys are the feature names to be used in read_features() and the values are the metadata of each feature.

abstract read_df(feature_name=None, feature_md5=None)#

Read feature into a pandas DataFrame.

Parameters:

feature_namestr, optional: Name of the feature to read (default None).
feature_md5str, optional: MD5 hash of the feature to read (default None).

Returns:

pandas.DataFrame: The features as a dataframe.

store(kind, **kwargs)#

Store extracted features data.

Parameters:

kind{“matrix”, “timeseries”, “table”}: The storage kind.
**kwargs: The keyword arguments.

Raises:

ValueError: If kind is invalid.

store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True)#

Store matrix.

Parameters:

meta_md5str

The metadata MD5 hash.

elementdict

The element as a dictionary.

datanumpy.ndarray

The matrix data to store.

col_nameslist or tuple of str, optional

The column names (default None).

row_namesstr, optional

The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).

matrix_kindstr, optional

The kind of matrix:

triu : store upper triangular only
tril : store lower triangular
full : full matrix

(default “full”).

diagonalbool, optional

Whether to store the diagonal. If matrix_kind is “full”, setting this to False will raise an error (default True).

abstract store_metadata(meta_md5, element, meta)#

Store metadata.

Parameters:

meta_md5str: The metadata MD5 hash.
elementdict: The element as a dictionary.
metadict: The metadata as a dictionary.

store_table(meta_md5, element, data, columns=None, rows_col_name=None)#

Store table.

Parameters:

meta_md5str: The metadata MD5 hash.
elementdict: The element as a dictionary.
datanumpy.ndarray or list: The table data to store.
columnslist or tuple of str, optional: The columns (default None).
rows_col_namestr, optional: The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).

store_timeseries(meta_md5, element, data, columns=None)#

Implement timeseries storing.

Parameters:

meta_md5str: The metadata MD5 hash.
elementdict: The element as a dictionary.
datanumpy.ndarray: The timeseries data to store.
columnslist or tuple of str, optional: The column labels (default None).

validate(input_)#

Validate the input to the pipeline step.

Parameters:

input_list of str: The input to the pipeline step.

Raises:

ValueError: If the input_ is invalid.

class junifer.storage.PandasBaseFeatureStorage(uri, single_output=True, **kwargs)#

Abstract base class for feature storage via pandas.

For every interface that is required, one needs to provide a concrete implementation of this abstract class.

Parameters:

uristr or pathlib.Path: The path to the storage.
single_outputbool, optional: Whether to have single output (default True).
**kwargs: Keyword arguments passed to superclass.