Input Data

julearn supports two kinds of data input configuration. The function run_cross_validation() takes as input the following variables:

  • X: Features

  • y: Target or labels

  • confounds: Confounds to remove (optional)

  • pos_labels: Labels to be considered as positive (optional, needed for some

    metrics)

  • groups: Grouping variables to avoid data leakage in some cross-validation

    schemes. See Cross Validation for more information.

julearn interprets data using two kinds of combinations:

2. Using Numpy arrays

This method allows X, y, confounds and groups to be specified as n-dimensional arrays. In this case, the number of samples for X, y, confounds and groups must match:

X.shape[0] == y.shape[0] == confunds.shape[0] == groups.shape[0]

X (and confounds) can be one- or two-dimensional, with each element in the second dimension representing a feature (or confound):

if X.ndim == 1:
    n_features == 1
else:
    n_features == X.shape[1]

Additionally, y and groups must be one-dimensional:

y.ndim == 1
groups.ndim == 1

The previous example can be also written as numpy arrays:

df_iris = load_dataset('iris')
features = ['sepal_length', 'sepal_width', 'petal_length']
target = 'species'
confound_names = 'petal_width'

X = df_iris[features].values
y = df_iris[target].values
confounds = df_iris[confound_names].values

And finally call run_cross_validation() without specifying the df parameter:

scores = run_cross_validation(X=X, y=y, confounds=confounds)