8.1.5. Storage#
Provide imports for storage sub-package.
- class junifer.storage.BaseFeatureStorage(uri, storage_types, single_output=True)#
Abstract base class for feature storage.
For every interface that is required, one needs to provide a concrete implementation of this abstract class.
- Parameters:
- abstract collect()#
Collect data.
- get_valid_inputs()#
Get valid storage types for input.
- abstract list_features()#
List the features in the storage.
- Returns:
dictList of features in the storage. The keys are the feature names to be used in read_features() and the values are the metadata of each feature.
- abstract read_df(feature_name=None, feature_md5=None)#
Read feature into a pandas DataFrame.
- Parameters:
- Returns:
pandas.DataFrameThe features as a dataframe.
- store(kind, **kwargs)#
Store extracted features data.
- Parameters:
- kind{“matrix”, “timeseries”, “table”}
The storage kind.
- **kwargs
The keyword arguments.
- Raises:
ValueErrorIf
kindis invalid.
- store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True)#
Store matrix.
- Parameters:
- meta_md5
str The metadata MD5 hash.
- element
dict The element as a dictionary.
- data
numpy.ndarray The matrix data to store.
- col_names
listortupleofstr, optional The column names (default None).
- row_names
str, optional The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).
- matrix_kind
str, optional The kind of matrix:
triu: store upper triangular onlytril: store lower triangularfull: full matrix
(default “full”).
- diagonalbool, optional
Whether to store the diagonal. If matrix_kind is “full”, setting this to False will raise an error (default True).
- meta_md5
- abstract store_metadata(meta_md5, element, meta)#
Store metadata.
- store_table(meta_md5, element, data, columns=None, rows_col_name=None)#
Store table.
- Parameters:
- meta_md5
str The metadata MD5 hash.
- element
dict The element as a dictionary.
- data
numpy.ndarrayorlist The table data to store.
- columns
listortupleofstr, optional The columns (default None).
- rows_col_name
str, optional The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).
- meta_md5
- store_timeseries(meta_md5, element, data, columns=None)#
Implement timeseries storing.
- validate(input_)#
Validate the input to the pipeline step.
- Parameters:
- Raises:
ValueErrorIf the
input_is invalid.
- class junifer.storage.PandasBaseFeatureStorage(uri, single_output=True, **kwargs)#
Abstract base class for feature storage via pandas.
For every interface that is required, one needs to provide a concrete implementation of this abstract class.
- Parameters:
- uri
strorpathlib.Path The path to the storage.
- single_outputbool, optional
Whether to have single output (default True).
- **kwargs
Keyword arguments passed to superclass.
- uri
See also
BaseFeatureStorageThe base class for feature storage.
- static element_to_index(element, n_rows=1, rows_col_name=None)#
Convert the element metadata to index.
- Parameters:
- Returns:
pandas.MultiIndexThe index of the dataframe to store.
- Raises:
ValueErrorIf meta does not contain the key “element”.
- get_valid_inputs()#
Get valid storage types for input.
- store_df(meta_md5, element, df)#
Implement pandas DataFrame storing.
- Parameters:
- df
pandas.DataFrameorpandas.Series The pandas DataFrame or Series to store.
- meta
dict The metadata as a dictionary.
- df
- Raises:
ValueErrorIf the dataframe index has items that are not in the index generated from the metadata.
- store_table(meta_md5, element, data, columns=None, rows_col_name=None)#
Implement table storing.
- Parameters:
- meta_md5
str The metadata MD5 hash.
- element
dict The element as a dictionary.
- data
numpy.ndarrayorList The table data to store.
- columns
listortupleofstr, optional The columns (default None).
- rows_col_name
str, optional The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).
- meta_md5
- class junifer.storage.SQLiteFeatureStorage(uri, single_output=True, upsert='update', **kwargs)#
Concrete implementation for feature storage via SQLite.
- Parameters:
- uri
strorpathlib.Path The path to the file to be used.
- single_outputbool, optional
If False, will create one file per element. The name of the file will be prefixed with the respective element. If True, will create only one file as specified in the uri and store all the elements in the same file. This behaviour is only suitable for non-parallel executions. SQLite does not support concurrency (default True).
- upsert{“ignore”, “update”}, optional
Upsert mode. If “ignore” is used, the existing elements are ignored. If “update”, the existing elements are updated (default “update”).
- **kwargs
dict The keyword arguments passed to the superclass.
- uri
See also
PandasBaseFeatureStorageThe base class for Pandas-based feature storage.
- collect()#
Implement data collection.
- Raises:
NotImplementedErrorIf
single_outputis True.
- get_engine(element=None)#
Get engine.
- Parameters:
- meta
dict, optional The metadata as dictionary (default None).
- meta
- Returns:
sqlalchemy.engine.EngineThe sqlalchemy engine.
- list_features()#
List the features in the storage.
- Returns:
dictList of features in the storage. The keys are the feature names to be used in read_features() and the values are the metadata of each feature.
- read_df(feature_name=None, feature_md5=None)#
Implement feature reading into a pandas DataFrame.
Either one of
feature_nameorfeature_md5needs to be specified.- Parameters:
- Returns:
pandas.DataFrameThe features as a dataframe.
- Raises:
ValueErrorIf parameter values are invalid or feature is not found or multiple features are found.
- store_df(meta_md5, element, df)#
Implement pandas DataFrame storing.
- Parameters:
- df
pandas.DataFrameorpandas.Series The pandas DataFrame or Series to store.
- meta
dict The metadata as a dictionary.
- df
- Raises:
ValueErrorIf the dataframe index has items that are not in the index generated from the metadata.
- store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True)#
Implement matrix storing.
- Parameters:
- meta_md5
str The metadata MD5 hash.
- element
dict The element as a dictionary.
- data
numpy.ndarray The matrix data to store.
- meta
dict The metadata as a dictionary.
- col_names
listortupleofstr, optional The column names (default None).
- row_names
str, optional The column name to use in case number of rows greater than 1. If None and number of rows greater than 1, then the name will be “index” (default None).
- matrix_kind
str, optional The kind of matrix:
triu: store upper triangular onlytril: store lower triangularfull: full matrix
(default “full”).
- diagonalbool, optional
Whether to store the diagonal. If matrix_kind is “full”, setting this to False will raise an error (default True).
- meta_md5