9.1.1. Data Grabbers#

DataGrabbers for datasets’ data description.

class junifer.datagrabber.BaseDataGrabber(types, datadir)#

Abstract base class for DataGrabber.

For every interface that is required, one needs to provide a concrete implementation of this abstract class.

Parameters:
typeslist of str

The types of data to be grabbed.

datadirstr or pathlib.Path

The directory where the data is / will be stored.

Attributes:
datadirpathlib.Path

Get data directory path.

property datadir: Path#

Get data directory path.

Returns:
pathlib.Path

Path to the data directory. Can be overridden by subclasses.

filter(selection)#

Filter elements to be grabbed.

Parameters:
selectionlist of str or tuple

The list of partial element key values to filter using.

Yields:
object

An element that can be indexed by the DataGrabber.

abstract get_element_keys()#

Get element keys.

For each item in the element tuple passed to __getitem__(), this method returns the corresponding key(s).

Returns:
list of str

The element keys.

abstract get_elements()#

Get elements.

Returns:
list

List of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the DataGrabber.

abstract get_item(**element)#

Get the specified item from the dataset.

Parameters:
elementdict

The element to be indexed.

Returns:
dict

Dictionary of paths for each type of data required for the specified element.

get_types()#

Get types.

Returns:
list of str

The types of data to be grabbed.

class junifer.datagrabber.DMCC13Benchmark(datadir=None, types=None, sessions=None, tasks=None, phase_encodings=None, runs=None, native_t1w=False)#

Concrete implementation for datalad-based data fetching of DMCC13.

Parameters:
datadirstr or Path or None, optional

The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

types: {“BOLD”, “BOLD_confounds”, “BOLD_mask”, “T1w”, “T1w_mask”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “Warp” (only if “native_t1w = True”)} or a list of the options, optional

DMCC data types. If None, all available data types are selected. (default None).

sessions: {“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”} or list of the options, optional

DMCC sessions. If None, all available sessions are selected (default None).

tasks: {“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”} or list of the options, optional

DMCC task sessions. If None, all available task sessions are selected (default None).

phase_encodings{“AP”, “PA”} or list of the options, optional

DMCC phase encoding directions. If None, all available phase encodings are selected (default None).

runs{“1”, “2”} or list of the options, optional

DMCC runs. If None, all available runs are selected (default None).

native_t1wbool, optional

Whether to use T1w in native space (default False).

Raises:
ValueError
If invalid value is passed for:
  • sessions

  • tasks

  • phase_encodings

  • runs

get_elements()#

Implement fetching list of subjects in the dataset.

Returns:
list of str

The list of subjects in the dataset.

get_item(subject, session, task, phase_encoding, run)#

Index one element in the dataset.

Parameters:
subjectstr

The subject ID.

session{“ses-wave1bas”, “ses-wave1pro”, “ses-wave1rea”}

The session to get.

task{“Rest”, “Axcpt”, “Cuedts”, “Stern”, “Stroop”}

The task to get.

phase_encoding{“AP”, “PA”}

The phase encoding to get.

run{“1”, “2”}

The run to get.

Returns:
outdict

Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladAOMICID1000(datadir=None, types=None, native_t1w=False)#

Concrete implementation for datalad-based data fetching of AOMIC ID1000.

Parameters:
datadirstr or Path or None, optional

The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

types: {“BOLD”, “BOLD_confounds”, “BOLD_mask”, “T1w”, “T1w_mask”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”} or a list of the options, optional

AOMIC data types. If None, all available data types are selected. (default None).

native_t1wbool, optional

Whether to use T1w in native space (default False).

class junifer.datagrabber.DataladAOMICPIOP1(datadir=None, types=None, tasks=None, native_t1w=False)#

Concrete implementation for pattern-based data fetching of AOMIC PIOP1.

Parameters:
datadirstr or Path or None, optional

The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

types: {“BOLD”, “BOLD_confounds”, “BOLD_mask”, “T1w”, “T1w_mask”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”} or a list of the options, optional

AOMIC data types. If None, all available data types are selected. (default None).

tasks{“restingstate”, “anticipation”, “emomatching”, “faces”, “gstroop”, “workingmemory”} or list of the options, optional

AOMIC PIOP1 task sessions. If None, all available task sessions are selected (default None).

native_t1wbool, optional

Whether to use T1w in native space (default False).

Raises:
ValueError

If invalid value is passed for tasks.

get_elements()#

Implement fetching list of subjects in the dataset.

Returns:
list of str

The list of subjects in the dataset.

get_item(subject, task)#

Index one element in the dataset.

Parameters:
subjectstr

The subject ID.

task{“restingstate”, “anticipation”, “emomatching”, “faces”, “gstroop”, “workingmemory”}

The task to get.

Returns:
outdict

Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladAOMICPIOP2(datadir=None, types=None, tasks=None, native_t1w=False)#

Concrete implementation for pattern-based data fetching of AOMIC PIOP2.

Parameters:
datadirstr or Path or None, optional

The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

types: {“BOLD”, “BOLD_confounds”, “BOLD_mask”, “T1w”, “T1w_mask”, “VBM_CSF”, “VBM_GM”, “VBM_WM”, “DWI”} or a list of the options, optional

AOMIC data types. If None, all available data types are selected. (default None).

tasks{“restingstate”, “stopsignal”, “workingmemory”} or list of the options, optional

AOMIC PIOP2 task sessions. If None, all available task sessions are selected (default None).

native_t1wbool, optional

Whether to use T1w in native space (default False).

Raises:
ValueError

If invalid value is passed for tasks.

get_elements()#

Implement fetching list of elements in the dataset.

Returns:
list

The list of elements that can be grabbed in the dataset after imposing constraints based on specified tasks.

get_item(subject, task)#

Index one element in the dataset.

Parameters:
subjectstr

The subject ID.

task{“restingstate”, “stopsignal”, “workingmemory”}

The task to get.

Returns:
outdict

Dictionary of paths for each type of data required for the specified element.

class junifer.datagrabber.DataladDataGrabber(rootdir='.', datadir=None, uri=None, **kwargs)#

Abstract base class for datalad-based data fetching.

Defines a DataGrabber that gets data from a datalad sibling.

Parameters:
rootdirstr or pathlib.Path, optional

The path within the datalad dataset to the root directory (default “.”).

datadirstr or pathlib.Path or None, optional

That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

uristr or None, optional

URI of the datalad sibling (default None).

**kwargs

Keyword arguments passed to superclass.

See also

BaseDataGrabber

Abstract base class for DataGrabber.

PatternDataGrabber

Concrete implementation for pattern-based data fetching.

PatternDataladDataGrabber

Concrete implementation for pattern and datalad based data fetching.

Notes

This class is intended to be used as a superclass of a subclass with multiple inheritance.

Methods

install:

Installs (clones) the datalad dataset into the datadir. This method is called automatically when the datagrabber is used within a context.

remove:

Removes the datalad dataset from the datadir. This method is called automatically when the datagrabber is used within a context.

cleanup()#

Cleanup the datalad dataset.

property datadir: Path#

Get data directory path.

Returns:
pathlib.Path

Path to the data directory.

install()#

Install the datalad dataset into the datadir.

Raises:
ValueError

If the dataset is already installed but with a different ID.

class junifer.datagrabber.DataladHCP1200(datadir=None, tasks=None, phase_encodings=None, ica_fix=False)#

Concrete implementation for datalad-based data fetching of HCP1200.

Parameters:
datadirstr or Path or None, optional

The directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or list of the options or None , optional

HCP task sessions. If None, all available task sessions are selected (default None).

phase_encodings{“LR”, “RL”} or list of the options or None, optional

HCP phase encoding directions. If None, both will be used (default None).

ica_fixbool, optional

Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).

Raises:
ValueError

If invalid value is passed for tasks or phase_encodings.

property skip_file_check: bool#

Skip file check existence.

class junifer.datagrabber.HCP1200(datadir, tasks=None, phase_encodings=None, ica_fix=False)#

Concrete implementation for pattern-based data fetching of HCP1200.

Parameters:
datadirstr or Path, optional

The directory where the data is / will be stored.

tasks{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”} or list of the options or None , optional

HCP task sessions. If None, all available task sessions are selected (default None).

phase_encodings{“LR”, “RL”} or list of the options or None, optional

HCP phase encoding directions. If None, both will be used (default None).

ica_fixbool, optional

Whether to retrieve data that was processed with ICA+FIX. Only “REST1” and “REST2” tasks are available with ICA+FIX (default False).

Raises:
ValueError

If invalid value is passed for tasks or phase_encodings.

get_elements()#

Implement fetching list of elements in the dataset.

Returns:
list

The list of elements that can be grabbed in the dataset.

get_item(subject, task, phase_encoding)#

Implement single element indexing in the database.

Parameters:
subjectstr

The subject ID.

task{“REST1”, “REST2”, “SOCIAL”, “WM”, “RELATIONAL”, “EMOTION”, “LANGUAGE”, “GAMBLING”, “MOTOR”}

The task.

phase_encoding{“LR”, “RL”}

The phase encoding.

Returns:
dict

Dictionary of dictionaries for each type of data required for the specified element.

class junifer.datagrabber.MultipleDataGrabber(datagrabbers, **kwargs)#

Concrete implementation for multi sourced data fetching.

Implements a DataGrabber which can be used to fetch data from multiple DataGrabbers.

Parameters:
datagrabberslist of DataGrabber-like objects

The DataGrabbers to use for fetching data.

**kwargs

Keyword arguments passed to superclass.

get_element_keys()#

Get element keys.

For each item in the element tuple passed to __getitem__(), this method returns the corresponding key(s).

Returns:
list of str

The element keys.

get_elements()#

Get elements.

Returns:
list

List of elements that can be grabbed. The elements can be strings, tuples or any object that will be then used as a key to index the the DataGrabber. The element should be present in all of the related DataGrabbers.

get_item(**_)#

Get the specified item from the dataset.

Parameters:
elementdict

The element to be indexed.

Returns:
dict

Dictionary of paths for each type of data required for the specified element.

Notes

This function is not implemented for this class as it is useless.

get_types()#

Get types.

Returns:
list of list of str

The types of data to be grabbed.

class junifer.datagrabber.PatternDataGrabber(types, patterns, replacements, datadir, confounds_format=None)#

Concrete implementation for pattern-based data fetching.

Implements a DataGrabber that understands patterns to grab data.

Parameters:
typeslist of str

The types of data to be grabbed.

patternsdict

Data type patterns as a dictionary. It has the following schema:

  • "T1w" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "T2w" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "BOLD" :

    {
      "mandatory": ["pattern", "space"],
      "optional": ["mask_item"]
    }
    
  • "Warp" :

    {
      "mandatory": ["pattern", "src", "dst"],
      "optional": []
    }
    
  • "BOLD_confounds" :

    {
      "mandatory": ["pattern", "format"],
      "optional": []
    }
    
  • "VBM_GM" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "VBM_WM" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    

Basically, for each data type, one needs to provide mandatory keys and can choose to also provide optional keys. The value for each key is a string. So, one needs to provide necessary data types as a dictionary, for example:

{
    "BOLD": {
      "pattern": "...",
      "space": "...",
    },
    "T1w": {
      "pattern": "...",
      "space": "...",
    },
    "Warp": {
      "pattern": "...",
      "src": "...",
      "dst": "...",
    }
}

taken from HCP1200.

replacementsstr or list of str

Replacements in the pattern key of each data type. The value needs to be a list of all possible replacements.

datadirstr or pathlib.Path

The directory where the data is / will be stored.

confounds_format{“fmriprep”, “adhoc”} or None, optional

The format of the confounds for the dataset (default None).

Raises:
ValueError

If confounds_format is invalid.

get_element_keys()#

Get element keys.

For each item in the “element” tuple, this functions returns the corresponding key, that is, the replacements of patterns defined in the constructor.

Returns:
list of str

The element keys.

get_elements()#

Implement fetching list of elements in the dataset.

It will use regex to search for “replacements” in the “patterns” and return the intersection of the results for each type i.e., build a list of elements that have all the required types.

Returns:
list

The list of elements that can be grabbed in the dataset.

get_item(**element)#

Implement single element indexing for the datagrabber.

This method constructs a real path to the requested item’s data, by replacing the patterns with actual values passed via **element.

Parameters:
elementdict

The element to be indexed. The keys must be the same as the replacements.

Returns:
dict

Dictionary of dictionaries for each type of data required for the specified element.

Raises:
RuntimeError

If more than one file matches for a data type’s pattern or if no file matches for a data type’s pattern or if file cannot be accessed for an element.

property skip_file_check: bool#

Skip file check existence.

class junifer.datagrabber.PatternDataladDataGrabber(**kwargs)#

Concrete implementation for pattern and datalad based data fetching.

Implements a DataGrabber that gets data from a datalad sibling, interpreting patterns.

Parameters:
typeslist of str

The types of data to be grabbed.

patternsdict

Data type patterns as a dictionary. It has the following schema:

  • "T1w" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "T2w" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "BOLD" :

    {
      "mandatory": ["pattern", "space"],
      "optional": ["mask_item"]
    }
    
  • "Warp" :

    {
      "mandatory": ["pattern", "src", "dst"],
      "optional": []
    }
    
  • "BOLD_confounds" :

    {
      "mandatory": ["pattern", "format"],
      "optional": []
    }
    
  • "VBM_GM" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    
  • "VBM_WM" :

    {
      "mandatory": ["pattern", "space"],
      "optional": []
    }
    

Basically, for each data type, one needs to provide mandatory keys and can choose to also provide optional keys. The value for each key is a string. So, one needs to provide necessary data types as a dictionary, for example:

{
    "BOLD": {
      "pattern": "...",
      "space": "...",
    },
    "T1w": {
      "pattern": "...",
      "space": "...",
    },
    "Warp": {
      "pattern": "...",
      "src": "...",
      "dst": "...",
    }
}

taken from HCP1200.

replacementsstr or list of str

Replacements in the pattern key of each data type. The value needs to be a list of all possible replacements.

confounds_format{“fmriprep”, “adhoc”} or None, optional

The format of the confounds for the dataset (default None).

datadirstr or pathlib.Path or None, optional

That directory where the datalad dataset will be cloned. If None, the datalad dataset will be cloned into a temporary directory (default None).

rootdirstr or pathlib.Path, optional

The path within the datalad dataset to the root directory (default “.”).

uristr or None, optional

URI of the datalad sibling (default None).

See also

DataladDataGrabber

Abstract base class for datalad-based data fetching.

PatternDataGrabber

Concrete implementation for pattern-based data fetching.