.. include:: links.inc Input Data ========== julearn supports two kinds of data input configuration. The function :func:`.run_cross_validation` takes as input the following variables: - `X`: Features - `y`: Target or labels - `confounds`: Confounds to remove (optional) - `pos_labels`: Labels to be considered as positive (optional, needed for some metrics) - `groups`: Grouping variables to avoid data leakage in some cross-validation schemes. See `Cross Validation`_ for more information. julearn interprets data using two kinds of combinations: Using Pandas dataframes (recommended) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This method interprets `X`, `y`, `confounds` and `groups` as columns in the dataframe (specified in `df`). For example, using the 'iris' dataset, we can specify: .. code-block:: python df_iris = load_dataset('iris') X = ['sepal_length', 'sepal_width', 'petal_length'] y = 'species' confounds = 'petal_width' And finally call :func:`.run_cross_validation` with the following parameters: .. code-block:: python scores = run_cross_validation(X=X, y=y, data=df_iris, confounds=confounds) Using regular expressions ------------------------- It might be the case that the number of elements of X and confounds are too many to specify manually all the column names. For this purpose, julearn provides the option of using regular expressions to match columns names. In the previous example, we can pick both ``sepal_width`` and ``sepal_length`` by using ``sepal_.*``. .. code-block:: python df_iris = load_dataset('iris') X = ['sepal_.*', 'petal_length'] y = 'species' confounds = 'petal_width' Additionally, we also provide a way to select all the columns, except for the ones used for ``y``, ``confounds`` and ``groups``. That is, using X = [':']. .. code-block:: python df_iris = load_dataset('iris') X = [':'] y = 'species' confounds = 'petal_width' For more information, check python's `Regular Expressions`_. Keep in mind that julearn uses `fullmatch`, so it requires that the regular expression matches the whole string and not part of it. Using Numpy arrays ^^^^^^^^^^^^^^^^^^ This method allows `X`, `y`, `confounds` and groups to be specified as n-dimensional arrays. In this case, the number of samples for `X`, `y`, `confounds` and `groups` must match: .. code-block:: python X.shape[0] == y.shape[0] == confunds.shape[0] == groups.shape[0] `X` (and confounds) can be one- or two-dimensional, with each element in the second dimension representing a feature (or confound): .. code-block:: python if X.ndim == 1: n_features == 1 else: n_features == X.shape[1] Additionally, `y` and `groups` must be one-dimensional: .. code-block:: python y.ndim == 1 groups.ndim == 1 The previous example can be also written as numpy arrays: .. code-block:: python df_iris = load_dataset('iris') features = ['sepal_length', 'sepal_width', 'petal_length'] target = 'species' confound_names = 'petal_width' X = df_iris[features].values y = df_iris[target].values confounds = df_iris[confound_names].values And finally call :func:`.run_cross_validation` without specifying the `df` parameter: .. code-block:: python scores = run_cross_validation(X=X, y=y, confounds=confounds)