9.1.1. Data Grabbers¶

DataGrabbers for datasets’ data description.

class junifer.datagrabber.BaseDataGrabber(types, datadir)¶

Abstract base class for DataGrabber.

For every interface that is required, one needs to provide a concrete implementation of this abstract class.

Parameters:

typeslist of str: The types of data to be grabbed.
datadirstr or pathlib.Path: The directory where the data is / will be stored.

Raises:

TypeError: If types is not a list or if the values are not string.

property datadir: Path¶

Get data directory path.

Returns:

pathlib.Path: Path to the data directory. Can be overridden by subclasses.

filter(selection)¶

Filter elements to be grabbed.

Parameters:

selectionlist of str or tuple: The list of partial element key values to filter using.

Yields:

object: An element that can be indexed by the DataGrabber.

abstract get_element_keys()¶

Get element keys.

For each item in the element tuple passed to __getitem__(), this method returns the corresponding key(s).

Returns:

list of str: The element keys.

abstract get_elements()¶

Get elements.

Returns:

list: List of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the DataGrabber.

abstract get_item(**element)¶

Get the specified item from the dataset.

Parameters:

elementdict: The element to be indexed.

Returns:

dict: Dictionary of paths for each type of data required for the specified element.

get_types()¶

Get types.

Returns:

list of str: The types of data to be grabbed.

class junifer.datagrabber.DMCC13Benchmark(datadir=None, types=None, sessions=None, tasks=None, phase_encodings=None, runs=None, native_t1w=False)¶

Concrete implementation for datalad-based data fetching of DMCC13.

Parameters:

datadirstr or Path or None, optional: The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”} or list of the options, optional: DMCC data types. If None, all available data types are selected. (default None).
sessions: {“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”} or list of the options, optional: DMCC sessions. If None, all available sessions are selected (default None).
tasks: {“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”} or list of the options, optional: DMCC task sessions. If None, all available task sessions are selected (default None).
phase_encodings{“AP”, “PA”} or list of the options, optional: DMCC phase encoding directions. If None, all available phase encodings are selected (default None).
runs{“1”, “2”} or list of the options, optional: DMCC runs. If None, all available runs are selected (default None).
native_t1wbool, optional: Whether to use T1w in native space (default False).

Raises:

ValueError

If invalid value is passed for:

sessions
tasks
phase_encodings
runs

get_elements()¶

Implement fetching list of subjects in the dataset.

Returns:

list of str: The list of subjects in the dataset.

get_item(subject, session, task, phase_encoding, run)¶

Index one element in the dataset.

Parameters:

subjectstr: The subject ID.
session{“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”}: The session to get.
task{“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”}: The task to get.
phase_encoding{“AP”, “PA”}: The phase encoding to get.
run{“1”, “2”}: The run to get.

Returns:

outdict: Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladAOMICID1000(datadir=None, types=None, native_t1w=False)¶

Concrete implementation for datalad-based data fetching of AOMIC ID1000.

Parameters:

datadirstr or Path or None, optional: The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional: AOMIC data types. If None, all available data types are selected. (default None).
native_t1wbool, optional: Whether to use T1w in native space (default False).

class junifer.datagrabber.DataladAOMICPIOP1(datadir=None, types=None, tasks=None, native_t1w=False)¶

Concrete implementation for pattern-based data fetching of AOMIC PIOP1.

Parameters:

datadirstr or Path or None, optional: The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional: AOMIC data types. If None, all available data types are selected. (default None).
tasks{“restingstate”, “anticipation”, “emomatching”, “faces”, “gstroop”, “workingmemory”} or list of the options, optional: AOMIC PIOP1 task sessions. If None, all available task sessions are selected (default None).
native_t1wbool, optional: Whether to use T1w in native space (default False).

Raises:

ValueError: If invalid value is passed for tasks.

get_elements()¶

Implement fetching list of subjects in the dataset.

Returns:

list of str: The list of subjects in the dataset.

get_item(subject, task)¶

Index one element in the dataset.

Parameters:

subjectstr: The subject ID.
task{“restingstate”, “anticipation”, “emomatching”, “faces”, “gstroop”, “workingmemory”}: The task to get.

Returns:

outdict: Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladAOMICPIOP2(datadir=None, types=None, tasks=None, native_t1w=False)¶

Concrete implementation for pattern-based data fetching of AOMIC PIOP2.

Parameters:

datadirstr or Path or None, optional: The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
types: {“BOLD”, “T1w”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”, “FreeSurfer”} or list of the options, optional: AOMIC data types. If None, all available data types are selected. (default None).
tasks{“restingstate”, “stopsignal”, “workingmemory”} or list of the options, optional: AOMIC PIOP2 task sessions. If None, all available task sessions are selected (default None).
native_t1wbool, optional: Whether to use T1w in native space (default False).

Raises:

ValueError: If invalid value is passed for tasks.

get_elements()¶

Implement fetching list of elements in the dataset.

Returns:

list: The list of elements that can be grabbed in the dataset after imposing constraints based on specified tasks.

get_item(subject, task)¶

Index one element in the dataset.

Parameters:

subjectstr: The subject ID.
task{“restingstate”, “stopsignal”, “workingmemory”}: The task to get.

Returns:

outdict: Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladDataGrabber(rootdir='.', datadir=None, uri=None, **kwargs)¶

Abstract base class for datalad-based data fetching.

Defines a DataGrabber that gets data from a datalad sibling.

Parameters:

rootdirstr or pathlib.Path, optional: The path within the datalad dataset to the root directory (default “.”).
datadirstr or pathlib.Path or None, optional: That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
uristr or None, optional: URI of the datalad sibling (default None).
**kwargs: Keyword arguments passed to superclass.

See also

BaseDataGrabber: Abstract base class for DataGrabber.
PatternDataGrabber: Concrete implementation for pattern-based data fetching.
PatternDataladDataGrabber: Concrete implementation for pattern and datalad based data fetching.

Notes

This class is intended to be used as a superclass of a subclass with multiple inheritance.

Methods

install:	Installs (clones) the datalad dataset into the `datadir`. This method is called automatically when the datagrabber is used within a context.
remove:	Removes the datalad dataset from the `datadir`. This method is called automatically when the datagrabber is used within a context.

cleanup()¶

Cleanup the datalad dataset.

property datadir: Path¶

Get data directory path.

Returns:

pathlib.Path: Path to the data directory.

install()¶

Install the datalad dataset into the datadir.

Raises:

ValueError: If the dataset is already installed but with a different ID.

class junifer.datagrabber.DataladHCP1200(datadir=None, tasks=None, phase_encodings=None, ica_fix=False)¶

Concrete implementation for datalad-based data fetching of HCP1200.

Parameters:

datadirstr or Path or None, optional: The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).
tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or list of the options or None , optional: HCP task sessions. If None, all available task sessions are selected (default None).
phase_encodings{“LR”, “RL”} or list of the options or None, optional: HCP phase encoding directions. If None, both will be used (default None).
ica_fixbool, optional: Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).

Raises:

ValueError: If invalid value is passed for tasks or phase_encodings.

property skip_file_check: bool¶: Skip file check existence.

class junifer.datagrabber.HCP1200(datadir, tasks=None, phase_encodings=None, ica_fix=False)¶

Concrete implementation for pattern-based data fetching of HCP1200.

Parameters:

datadirstr or Path, optional: The directory where the data is / will be stored.
tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or list of the options or None , optional: HCP task sessions. If None, all available task sessions are selected (default None).
phase_encodings{“LR”, “RL”} or list of the options or None, optional: HCP phase encoding directions. If None, both will be used (default None).
ica_fixbool, optional: Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).

Raises:

ValueError: If invalid value is passed for tasks or phase_encodings.

get_elements()¶

Implement fetching list of elements in the dataset.

Returns:

list: The list of elements that can be grabbed in the dataset.

get_item(subject, task, phase_encoding)¶

Implement single element indexing in the database.

Parameters:

subjectstr: The subject ID.
task{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”}: The task.
phase_encoding{“LR”, “RL”}: The phase encoding.

Returns:

dict: Dictionary of dictionaries for each type of data required for the specified element.

class junifer.datagrabber.MultipleDataGrabber(datagrabbers, **kwargs)¶

Concrete implementation for multi sourced data fetching.

Implements a DataGrabber which can be used to fetch data from multiple DataGrabbers.

Parameters:

datagrabberslist of DataGrabber-like objects: The DataGrabbers to use for fetching data.
**kwargs: Keyword arguments passed to superclass.

Raises:

RuntimeError: If datagrabbers have different element keys or overlapping data types or nested data types.

get_element_keys()¶

Get element keys.

For each item in the element tuple passed to __getitem__(), this method returns the corresponding key(s).

Returns:

list of str: The element keys.

get_elements()¶

Get elements.

Returns:

list: List of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the the DataGrabber. The element should be present in all of the related DataGrabbers.

get_item(**_)¶

Get the specified item from the dataset.

Parameters:

elementdict: The element to be indexed.

Returns:

dict: Dictionary of paths for each type of data required for the specified element.

Notes

This function is not implemented for this class as it is useless.

get_types()¶

Get types.

Returns:

list of list of str: The types of data to be grabbed.

class junifer.datagrabber.PatternDataGrabber(types, patterns, replacements, datadir, confounds_format=None, partial_pattern_ok=False)¶

Concrete implementation for pattern-based data fetching.

Implements a DataGrabber that understands patterns to grab data.

Parameters:

typeslist of str

The types of data to be grabbed.

patternsdict

Data type patterns as a dictionary. It has the following schema:

"T1w" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
  }
}

"T2w" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
  }
}

"BOLD" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
      "confounds": {
          "mandatory": ["pattern", "format"],
          "optional": []
      }
  }
}

"Warp" :

{
  "mandatory": ["pattern", "src", "dst"],
  "optional": []
}

"VBM_GM" :

{
  "mandatory": ["pattern", "space"],
  "optional": []
}

"VBM_WM" :

{
  "mandatory": ["pattern", "space"],
  "optional": []
}

Basically, for each data type, one needs to provide mandatory keys and can choose to also provide optional keys. The value for each key is a string. So, one needs to provide necessary data types as a dictionary, for example:

{
    "BOLD": {
      "pattern": "...",
      "space": "...",
    },
    "T1w": {
      "pattern": "...",
      "space": "...",
    },
    "Warp": {
      "pattern": "...",
      "src": "...",
      "dst": "...",
    }
}

taken from HCP1200.

replacementsstr or list of str

Replacements in the pattern key of each data type. The value needs to be a list of all possible replacements.

datadirstr or pathlib.Path

The directory where the data is / will be stored.

confounds_format{“fmriprep”, “adhoc”} or None, optional

The format of the confounds for the dataset (default None).

partial_pattern_okbool, optional

Whether to raise error if partial pattern for a data type is found. This allows to bypass mandatory key check and issue a warning instead of raising error. This allows one to have a DataGrabber with data types without the corresponding mandatory keys and is powerful when used with MultipleDataGrabber (default True).

Raises:

ValueError: If confounds_format is invalid.

get_element_keys()¶

Get element keys.

For each item in the “element” tuple, this functions returns the corresponding key, that is, the replacements of patterns defined in the constructor.

Returns:

list of str: The element keys.

get_elements()¶

Implement fetching list of elements in the dataset.

It will use regex to search for “replacements” in the “patterns” and return the intersection of the results for each type i.e., build a list of elements that have all the required types.

Returns:

list: The list of elements that can be grabbed in the dataset.

get_item(**element)¶

Implement single element indexing for the datagrabber.

This method constructs a real path to the requested item’s data, by replacing the patterns with actual values passed via **element.

Parameters:

elementdict: The element to be indexed. The keys must be the same as the replacements.

Returns:

dict: Dictionary of dictionaries for each type of data required for the specified element.

property skip_file_check: bool¶: Skip file check existence.

class junifer.datagrabber.PatternDataladDataGrabber(**kwargs)¶

Concrete implementation for pattern and datalad based data fetching.

Implements a DataGrabber that gets data from a datalad sibling, interpreting patterns.

Parameters:

typeslist of str

The types of data to be grabbed.

patternsdict

Data type patterns as a dictionary. It has the following schema:

"T1w" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
  }
}

"T2w" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
  }
}

"BOLD" :

{
  "mandatory": ["pattern", "space"],
  "optional": {
      "mask": {
          "mandatory": ["pattern", "space"],
          "optional": []
      }
      "confounds": {
          "mandatory": ["pattern", "format"],
          "optional": []
      }
  }
}

"Warp" :

{
  "mandatory": ["pattern", "src", "dst"],
  "optional": []
}

"VBM_GM" :

{
  "mandatory": ["pattern", "space"],
  "optional": []
}

"VBM_WM" :

{
  "mandatory": ["pattern", "space"],
  "optional": []
}

{
    "BOLD": {
      "pattern": "...",
      "space": "...",
    },
    "T1w": {
      "pattern": "...",
      "space": "...",
    },
    "Warp": {
      "pattern": "...",
      "src": "...",
      "dst": "...",
    }
}

taken from HCP1200.

replacementsstr or list of str

Replacements in the pattern key of each data type. The value needs to be a list of all possible replacements.

confounds_format{“fmriprep”, “adhoc”} or None, optional

The format of the confounds for the dataset (default None).

datadirstr or pathlib.Path or None, optional

That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

rootdirstr or pathlib.Path, optional

The path within the datalad dataset to the root directory (default “.”).

uristr or None, optional

URI of the datalad sibling (default None).