9.1.1. Data Grabbers¶
DataGrabbers for datasets’ data description.
- class junifer.datagrabber.BaseDataGrabber(types, datadir)¶
Abstract base class for DataGrabber.
For every interface that is required, one needs to provide a concrete implementation of this abstract class.
- Parameters:
- types
listofstr The types of data to be grabbed.
- datadir
strorpathlib.Path The directory where the data is / will be stored.
- types
- Raises:
TypeErrorIf
typesis not a list or if the values are not string.
- property datadir: Path¶
Get data directory path.
- Returns:
pathlib.PathPath to the data directory. Can be overridden by subclasses.
- filter(selection)¶
Filter elements to be grabbed.
- abstract get_element_keys()¶
Get element keys.
For each item in the
elementtuple passed to__getitem__(), this method returns the corresponding key(s).
- abstract get_elements()¶
Get elements.
- Returns:
listList of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the DataGrabber.
- abstract get_item(**element)¶
Get the specified item from the dataset.
- class junifer.datagrabber.DMCC13Benchmark(datadir=None, types=None, sessions=None, tasks=None, phase_encodings=None, runs=None, native_t1w=False)¶
Concrete implementation for datalad-based data fetching of DMCC13.
- Parameters:
- datadir
strorPathorNone, optional The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”} or list of the options, optional
DMCC data types. If None, all available data types are selected. (default None).
- sessions: {“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”} or list of the options, optional
DMCC sessions. If None, all available sessions are selected (default None).
- tasks: {“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”} or list of the options, optional
DMCC task sessions. If None, all available task sessions are selected (default None).
- phase_encodings{“AP”, “PA”} or
listof the options, optional DMCC phase encoding directions. If None, all available phase encodings are selected (default None).
- runs{“1”, “2”} or
listof the options, optional DMCC runs. If None, all available runs are selected (default None).
- native_t1wbool, optional
Whether to use T1w in native space (default False).
- datadir
- Raises:
ValueError- If invalid value is passed for:
sessionstasksphase_encodingsruns
- get_elements()¶
Implement fetching list of subjects in the dataset.
- get_item(subject, session, task, phase_encoding, run)¶
Index one element in the dataset.
- Parameters:
- subject
str The subject ID.
- session{“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”}
The session to get.
- task{“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”}
The task to get.
- phase_encoding{“AP”, “PA”}
The phase encoding to get.
- run{“1”, “2”}
The run to get.
- subject
- Returns:
- out
dict Dictionary of paths for each type of data required for the specified element.
- out
- class junifer.datagrabber.DataladAOMICID1000(datadir=None, types=None, native_t1w=False)¶
Concrete implementation for datalad-based data fetching of AOMIC ID1000.
- Parameters:
- datadir
strorPathorNone, optional The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional
AOMIC data types. If None, all available data types are selected. (default None).
- native_t1wbool, optional
Whether to use T1w in native space (default False).
- datadir
- class junifer.datagrabber.DataladAOMICPIOP1(datadir=None, types=None, tasks=None, native_t1w=False)¶
Concrete implementation for pattern-based data fetching of AOMIC PIOP1.
- Parameters:
- datadir
strorPathorNone, optional The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional
AOMIC data types. If None, all available data types are selected. (default None).
- tasks{“restingstate”, “anticipation”, “emomatching”, “faces”, “gstroop”, “workingmemory”} or
listof the options, optional AOMIC PIOP1 task sessions. If None, all available task sessions are selected (default None).
- native_t1wbool, optional
Whether to use T1w in native space (default False).
- datadir
- Raises:
ValueErrorIf invalid value is passed for
tasks.
- get_elements()¶
Implement fetching list of subjects in the dataset.
- class junifer.datagrabber.DataladAOMICPIOP2(datadir=None, types=None, tasks=None, native_t1w=False)¶
Concrete implementation for pattern-based data fetching of AOMIC PIOP2.
- Parameters:
- datadir
strorPathorNone, optional The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional
AOMIC data types. If None, all available data types are selected. (default None).
- tasks{“restingstate”, “stopsignal”, “workingmemory”} or
listof the options, optional AOMIC PIOP2 task sessions. If None, all available task sessions are selected (default None).
- native_t1wbool, optional
Whether to use T1w in native space (default False).
- datadir
- Raises:
ValueErrorIf invalid value is passed for
tasks.
- class junifer.datagrabber.DataladDataGrabber(rootdir='.', datadir=None, uri=None, **kwargs)¶
Abstract base class for datalad-based data fetching.
Defines a DataGrabber that gets data from a datalad sibling.
- Parameters:
- rootdir
strorpathlib.Path, optional The path within the datalad dataset to the root directory (default “.”).
- datadir
strorpathlib.PathorNone, optional That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- uri
strorNone, optional URI of the datalad sibling (default None).
- **kwargs
Keyword arguments passed to superclass.
- rootdir
See also
BaseDataGrabberAbstract base class for DataGrabber.
PatternDataGrabberConcrete implementation for pattern-based data fetching.
PatternDataladDataGrabberConcrete implementation for pattern and datalad based data fetching.
Notes
This class is intended to be used as a superclass of a subclass with multiple inheritance.
Methods
install:
Installs (clones) the datalad dataset into the
datadir. This method is called automatically when the datagrabber is used within a context.remove:
Removes the datalad dataset from the
datadir. This method is called automatically when the datagrabber is used within a context.- cleanup()¶
Cleanup the datalad dataset.
- property datadir: Path¶
Get data directory path.
- Returns:
pathlib.PathPath to the data directory.
- install()¶
Install the datalad dataset into the
datadir.- Raises:
ValueErrorIf the dataset is already installed but with a different ID.
- class junifer.datagrabber.DataladHCP1200(datadir=None, tasks=None, phase_encodings=None, ica_fix=False)¶
Concrete implementation for datalad-based data fetching of HCP1200.
- Parameters:
- datadir
strorPathorNone, optional The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or
listof the options orNone, optional HCP task sessions. If None, all available task sessions are selected (default None).
- phase_encodings{“LR”, “RL”} or
listof the options orNone, optional HCP phase encoding directions. If None, both will be used (default None).
- ica_fixbool, optional
Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).
- datadir
- Raises:
ValueErrorIf invalid value is passed for
tasksorphase_encodings.
- class junifer.datagrabber.HCP1200(datadir, tasks=None, phase_encodings=None, ica_fix=False)¶
Concrete implementation for pattern-based data fetching of HCP1200.
- Parameters:
- datadir
strorPath, optional The directory where the data is / will be stored.
- tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or
listof the options orNone, optional HCP task sessions. If None, all available task sessions are selected (default None).
- phase_encodings{“LR”, “RL”} or
listof the options orNone, optional HCP phase encoding directions. If None, both will be used (default None).
- ica_fixbool, optional
Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).
- datadir
- Raises:
ValueErrorIf invalid value is passed for
tasksorphase_encodings.
- get_elements()¶
Implement fetching list of elements in the dataset.
- Returns:
listThe list of elements that can be grabbed in the dataset.
- get_item(subject, task, phase_encoding)¶
Implement single element indexing in the database.
- class junifer.datagrabber.MultipleDataGrabber(datagrabbers, **kwargs)¶
Concrete implementation for multi sourced data fetching.
Implements a DataGrabber which can be used to fetch data from multiple DataGrabbers.
- Parameters:
- datagrabbers
listof DataGrabber-like objects The DataGrabbers to use for fetching data.
- **kwargs
Keyword arguments passed to superclass.
- datagrabbers
- Raises:
RuntimeErrorIf
datagrabbershave different element keys or overlapping data types or nested data types.
- get_element_keys()¶
Get element keys.
For each item in the
elementtuple passed to__getitem__(), this method returns the corresponding key(s).
- get_elements()¶
Get elements.
- Returns:
listList of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the the DataGrabber. The element should be present in all of the related DataGrabbers.
- class junifer.datagrabber.PatternDataGrabber(types, patterns, replacements, datadir, confounds_format=None, partial_pattern_ok=False)¶
Concrete implementation for pattern-based data fetching.
Implements a DataGrabber that understands patterns to grab data.
- Parameters:
- types
listofstr The types of data to be grabbed.
- patterns
dict Data type patterns as a dictionary. It has the following schema:
"T1w":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } } }"T2w":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } } }"BOLD":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } "confounds": { "mandatory": ["pattern", "format"], "optional": [] } } }"Warp":{ "mandatory": ["pattern", "src", "dst"], "optional": [] }"VBM_GM":{ "mandatory": ["pattern", "space"], "optional": [] }"VBM_WM":{ "mandatory": ["pattern", "space"], "optional": [] }
Basically, for each data type, one needs to provide
mandatorykeys and can choose to also provideoptionalkeys. The value for each key is a string. So, one needs to provide necessary data types as a dictionary, for example:{ "BOLD": { "pattern": "...", "space": "...", }, "T1w": { "pattern": "...", "space": "...", }, "Warp": { "pattern": "...", "src": "...", "dst": "...", } }taken from
HCP1200.- replacements
strorlistofstr Replacements in the
patternkey of each data type. The value needs to be a list of all possible replacements.- datadir
strorpathlib.Path The directory where the data is / will be stored.
- confounds_format{“fmriprep”, “adhoc”} or
None, optional The format of the confounds for the dataset (default None).
- partial_pattern_okbool, optional
Whether to raise error if partial pattern for a data type is found. This allows to bypass mandatory key check and issue a warning instead of raising error. This allows one to have a DataGrabber with data types without the corresponding mandatory keys and is powerful when used with
MultipleDataGrabber(default True).
- types
- Raises:
ValueErrorIf
confounds_formatis invalid.
- get_element_keys()¶
Get element keys.
For each item in the “element” tuple, this functions returns the corresponding key, that is, the
replacementsof patterns defined in the constructor.
- get_elements()¶
Implement fetching list of elements in the dataset.
It will use regex to search for “replacements” in the “patterns” and return the intersection of the results for each type i.e., build a list of elements that have all the required types.
- Returns:
listThe list of elements that can be grabbed in the dataset.
- get_item(**element)¶
Implement single element indexing for the datagrabber.
This method constructs a real path to the requested item’s data, by replacing the
patternswith actual values passed via**element.
- class junifer.datagrabber.PatternDataladDataGrabber(**kwargs)¶
Concrete implementation for pattern and datalad based data fetching.
Implements a DataGrabber that gets data from a datalad sibling, interpreting patterns.
- Parameters:
- types
listofstr The types of data to be grabbed.
- patterns
dict Data type patterns as a dictionary. It has the following schema:
"T1w":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } } }"T2w":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } } }"BOLD":{ "mandatory": ["pattern", "space"], "optional": { "mask": { "mandatory": ["pattern", "space"], "optional": [] } "confounds": { "mandatory": ["pattern", "format"], "optional": [] } } }"Warp":{ "mandatory": ["pattern", "src", "dst"], "optional": [] }"VBM_GM":{ "mandatory": ["pattern", "space"], "optional": [] }"VBM_WM":{ "mandatory": ["pattern", "space"], "optional": [] }
Basically, for each data type, one needs to provide
mandatorykeys and can choose to also provideoptionalkeys. The value for each key is a string. So, one needs to provide necessary data types as a dictionary, for example:{ "BOLD": { "pattern": "...", "space": "...", }, "T1w": { "pattern": "...", "space": "...", }, "Warp": { "pattern": "...", "src": "...", "dst": "...", } }taken from
HCP1200.- replacements
strorlistofstr Replacements in the
patternkey of each data type. The value needs to be a list of all possible replacements.- confounds_format{“fmriprep”, “adhoc”} or
None, optional The format of the confounds for the dataset (default None).
- datadir
strorpathlib.PathorNone, optional That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
- rootdir
strorpathlib.Path, optional The path within the datalad dataset to the root directory (default “.”).
- uri
strorNone, optional URI of the datalad sibling (default None).
- types
See also
DataladDataGrabberAbstract base class for datalad-based data fetching.
PatternDataGrabberConcrete implementation for pattern-based data fetching.
- class junifer.datagrabber.PatternValidationMixin¶
Mixin class for pattern validation.
- validate_patterns(types, replacements, patterns, partial_pattern_ok=False)¶
Validate the patterns.
- Parameters:
- types
listofstr The data types to check patterns of.
- replacements
listofstr The replacements to be replaced in the patterns.
- patterns
dict The patterns to validate.
- partial_pattern_okbool, optional
Whether to raise error if partial pattern for a data type is found. If False, a warning is issued instead of raising an error (default False).
- types
- Raises:
TypeErrorIf
patternsis not a dictionary.ValueErrorIf length of
typesandpatternsare different or ifpatternsis missing entries fromtypesor if unknown data type is found inpatternsor if data type pattern key contains ‘*’ as value.