9.1.5. Storage¶
Storages for storing extracted features.
- class junifer.storage.BaseFeatureStorage(uri, storage_types, single_output=True)¶
- Abstract base class for feature storage. - For every interface that is required, one needs to provide a concrete implementation of this abstract class. - Parameters:
- Raises:
- ValueError
- If required storage type(s) is(are) missing from - storage_types.
 
 - abstract collect()¶
- Collect data. 
 - get_valid_inputs()¶
- Get valid storage types for input. 
 - abstract list_features()¶
- List the features in the storage. 
 - abstract read(feature_name=None, feature_md5=None)¶
- Read stored feature. 
 - abstract read_df(feature_name=None, feature_md5=None)¶
- Read feature into a pandas DataFrame. - Parameters:
- Returns:
- pandas.DataFrame
- The features as a dataframe. 
 
 
 - store(kind, **kwargs)¶
- Store extracted features data. - Parameters:
- kind{“matrix”, “timeseries”, “vector”, “scalar_table”}
- The storage kind. 
- **kwargs
- The keyword arguments. 
 
- Raises:
- ValueError
- If - kindis invalid.
 
 
 - store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True)¶
- Store matrix. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- datanumpy.ndarray
- The matrix data to store. 
- col_nameslistortupleofstr, optional
- The column labels (default None). 
- row_namesstr, optional
- The row labels (default None). 
- matrix_kindstr, optional
- The kind of matrix: - triu: store upper triangular only
- tril: store lower triangular
- full: full matrix
 - (default “full”). 
- diagonalbool, optional
- Whether to store the diagonal. If - matrix_kind = full, setting this to False will raise an error (default True).
 
- meta_md5
 
 - abstract store_metadata(meta_md5, element, meta)¶
- Store metadata. 
 - store_scalar_table(meta_md5, element, data, col_names=None, row_names=None, row_header_col_name='feature')¶
- Store table with scalar values. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- datanumpy.ndarray
- The timeseries data to store. 
- col_nameslistortupleofstr, optional
- The column labels (default None). 
- row_namesstr, optional
- The row labels (default None). 
- row_header_col_namestr, optional
- The column name for the row header column (default “feature”). 
 
- meta_md5
 
 - store_timeseries(meta_md5, element, data, col_names=None)¶
- Store timeseries. 
 - store_vector(meta_md5, element, data, col_names=None)¶
- Store vector. 
 - validate(input_)¶
- Validate the input to the pipeline step. - Parameters:
- Raises:
- ValueError
- If the - input_is invalid.
 
 
 
- class junifer.storage.HDF5FeatureStorage(uri, single_output=True, overwrite='update', compression=7, force_float32=True, chunk_size=100)¶
- Concrete implementation for feature storage via HDF5. - Parameters:
- uristrorpathlib.Path
- The path to the file to be used. 
- single_outputbool, optional
- If False, will create one HDF5 file per element. The name of the file will be prefixed with the respective element. If True, will create only one HDF5 file as specified in the - uriand store all the elements in the same file. Concurrent writes should be handled with care (default True).
- overwritebool or “update”, optional
- Whether to overwrite existing file. If True, will overwrite and if “update”, will update existing entry or append (default “update”). 
- compression{0-9}, optional
- Level of gzip compression: 0 (lowest) to 9 (highest) (default 7). 
- force_float32bool, optional
- Whether to force casting of numpy.ndarray values to float32 if float64 values are found (default True). 
- chunk_sizeint, optional
- The chunk size to use when collecting data from element files in - collect(). If the file count is smaller than the value, the minimum is used (default 100).
 
- uri
 - See also - SQLiteFeatureStorage
- The concrete class for SQLite-based feature storage. 
 - collect()¶
- Implement data collection. - This method globs the element files and runs a loop over them while reading metadata and then runs a loop over all the stored features in the metadata, storing the metadata and the feature data right after reading. - Raises:
- NotImplementedError
- If - single_outputis True.
 
 
 - get_valid_inputs()¶
- Get valid storage types for input. 
 - list_features()¶
- List the features in the storage. 
 - read(feature_name=None, feature_md5=None)¶
- Read stored feature. 
 - read_df(feature_name=None, feature_md5=None)¶
- Read feature into a pandas.DataFrame. - Either one of - feature_nameor- feature_md5needs to be specified.- Parameters:
- Returns:
- pandas.DataFrame
- The features as a dataframe. 
 
- Raises:
- IOError
- If HDF5 file does not exist. 
 
 
 - store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True, row_header_col_name='ROI')¶
- Store matrix. - This method performs parameter checks and then calls - _store_datafor storing the data.- Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as dictionary. 
- datanumpy.ndarray
- The matrix data to store. 
- col_nameslistortupleofstr, optional
- The column labels (default None). 
- row_nameslistortupleofstr, optional
- The row labels (default None). 
- matrix_kindstr, optional
- The kind of matrix: - triu: store upper triangular only
- tril: store lower triangular
- full: full matrix
 - (default “full”). 
- diagonalbool, optional
- Whether to store the diagonal. If - matrix_kindis “full”, setting this to False will raise an error (default True).
- row_header_col_namestr, optional
- The column name for the row header column (default “ROI”). 
 
- meta_md5
- Raises:
- ValueError
- If invalid - matrix_kindis provided,- diagonal = Falsefor- matrix_kind = "full", non-square data is provided for- matrix_kind = {"triu", "tril"}, length of- row_namesdo not match data row count, or length of- col_namesdo not match data column count.
 
 
 - store_metadata(meta_md5, element, meta)¶
- Store metadata. - This method first loads existing metadata (if any) using - _read_metadataand appends to it the new metadata and then saves the updated metadata using- _write_processed_data. It will only store metadata if- meta_md5is not found already.
 - store_scalar_table(meta_md5, element, data, col_names=None, row_names=None, row_header_col_name='feature')¶
- Store table with scalar values. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- datanumpy.ndarray
- The scalar table data to store. 
- col_nameslistortupleofstr, optional
- The column labels (default None). 
- row_namesstr, optional
- The row labels (default None). 
- row_header_col_namestr, optional
- The column name for the row header column (default “feature”). 
 
- meta_md5
 
 - store_timeseries(meta_md5, element, data, col_names=None)¶
- Store timeseries. 
 
- class junifer.storage.PandasBaseFeatureStorage(uri, single_output=True, **kwargs)¶
- Abstract base class for feature storage via pandas. - For every interface that is required, one needs to provide a concrete implementation of this abstract class. - Parameters:
- uristrorpathlib.Path
- The path to the storage. 
- single_outputbool, optional
- Whether to have single output (default True). 
- **kwargs
- Keyword arguments passed to superclass. 
 
- uri
 - See also - BaseFeatureStorage
- The base class for feature storage. 
 - static element_to_index(element, n_rows=1, rows_col_name=None)¶
- Convert the element metadata to index. - Parameters:
- Returns:
- pandas.Indexor- pandas.MultiIndex
- The index of the dataframe to store. 
 
 
 - get_valid_inputs()¶
- Get valid storage types for input. 
 - store_df(meta_md5, element, df)¶
- Implement pandas DataFrame storing. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- dfpandas.DataFrameorpandas.Series
- The pandas DataFrame or Series to store. 
 
- meta_md5
- Raises:
- ValueError
- If the dataframe index has items that are not in the index generated from the metadata. 
 
 
 - store_timeseries(meta_md5, element, data, col_names=None)¶
- Store timeseries. 
 
- class junifer.storage.SQLiteFeatureStorage(uri, single_output=True, upsert='update', **kwargs)¶
- Concrete implementation for feature storage via SQLite. - Parameters:
- uristrorpathlib.Path
- The path to the file to be used. 
- single_outputbool, optional
- If False, will create one SQLite file per element. The name of the file will be prefixed with the respective element. If True, will create only one SQLite file as specified in the - uriand store all the elements in the same file. This behaviour is only suitable for non-parallel executions. SQLite does not support concurrency (default True).
- upsert{“ignore”, “update”}, optional
- Upsert mode. If “ignore” is used, the existing elements are ignored. If “update”, the existing elements are updated (default “update”). 
- **kwargsdict
- The keyword arguments passed to the superclass. 
 
- uri
 - See also - PandasBaseFeatureStorage
- The base class for Pandas-based feature storage. 
- HDF5FeatureStorage
- The concrete class for HDF5-based feature storage. 
 - collect()¶
- Implement data collection. - Raises:
- NotImplementedError
- If - single_outputis True.
 
 
 - get_engine(element=None)¶
- Get engine. - Parameters:
- elementdict, optional
- The element as dictionary (default None). 
 
- element
- Returns:
- sqlalchemy.engine.Engine
- The sqlalchemy engine. 
 
 
 - list_features()¶
- List the features in the storage. 
 - read(feature_name=None, feature_md5=None)¶
- Read stored feature. 
 - read_df(feature_name=None, feature_md5=None)¶
- Implement feature reading into a pandas DataFrame. - Either one of - feature_nameor- feature_md5needs to be specified.- Parameters:
- Returns:
- pandas.DataFrame
- The features as a dataframe. 
 
- Raises:
- ValueError
- If parameter values are invalid or feature is not found or multiple features are found. 
 
 
 - store_df(meta_md5, element, df)¶
- Implement pandas DataFrame storing. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- dfpandas.DataFrameorpandas.Series
- The pandas DataFrame or Series to store. 
 
- meta_md5
- Raises:
- ValueError
- If the dataframe index has items that are not in the index generated from the metadata. 
 
 
 - store_matrix(meta_md5, element, data, col_names=None, row_names=None, matrix_kind='full', diagonal=True)¶
- Store matrix. - Parameters:
- meta_md5str
- The metadata MD5 hash. 
- elementdict
- The element as a dictionary. 
- datanumpy.ndarray
- The matrix data to store. 
- metadict
- The metadata as a dictionary. 
- col_nameslistortupleofstr, optional
- The column labels (default None). 
- row_namesstr, optional
- The row labels (optional None). 
- matrix_kindstr, optional
- The kind of matrix: - triu: store upper triangular only
- tril: store lower triangular
- full: full matrix
 - (default “full”). 
- diagonalbool, optional
- Whether to store the diagonal. If - matrix_kind = full, setting this to False will raise an error (default True).
 
- meta_md5