.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/02_inspection/plot_groupcv_inspect_svm.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_02_inspection_plot_groupcv_inspect_svm.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_02_inspection_plot_groupcv_inspect_svm.py:


Inspecting SVM models
=====================

This example uses the ``fmri`` dataset, performs simple binary classification
using a Support Vector Machine classifier and analyse the model.

References
----------

  Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of
  cognitive control in context-dependent decision-making. Cerebral Cortex.

.. include:: ../../links.inc

.. GENERATED FROM PYTHON SOURCE LINES 16-33

.. code-block:: Python

    # Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
    #          Shammi More <s.more@fz-juelich.de>
    # License: AGPL

    import numpy as np
    import pandas as pd

    from sklearn.model_selection import GroupShuffleSplit

    import matplotlib.pyplot as plt
    import seaborn as sns
    from seaborn import load_dataset

    from julearn import run_cross_validation
    from julearn.utils import configure_logging
    from julearn.inspect import preprocess


.. GENERATED FROM PYTHON SOURCE LINES 34-35

Set the logging level to info to see extra information.

.. GENERATED FROM PYTHON SOURCE LINES 35-38

.. code-block:: Python

    configure_logging(level="INFO")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:54:02,936 - julearn - INFO - ===== Lib Versions =====
    2026-01-16 10:54:02,937 - julearn - INFO - numpy: 1.26.4
    2026-01-16 10:54:02,937 - julearn - INFO - scipy: 1.17.0
    2026-01-16 10:54:02,937 - julearn - INFO - sklearn: 1.7.2
    2026-01-16 10:54:02,937 - julearn - INFO - pandas: 2.3.3
    2026-01-16 10:54:02,937 - julearn - INFO - julearn: 0.3.5.dev123
    2026-01-16 10:54:02,937 - julearn - INFO - ========================


.. GENERATED FROM PYTHON SOURCE LINES 39-41

Dealing with Cross-Validation techniques
----------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 41-44

.. code-block:: Python


    df_fmri = load_dataset("fmri")


.. GENERATED FROM PYTHON SOURCE LINES 45-46

First, let's get some information on what the dataset has:

.. GENERATED FROM PYTHON SOURCE LINES 46-49

.. code-block:: Python


    print(df_fmri.head())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      subject  timepoint event    region    signal
    0     s13         18  stim  parietal -0.017552
    1      s5         14  stim  parietal -0.080883
    2     s12         18  stim  parietal -0.081033
    3     s11         18  stim  parietal -0.046134
    4     s10         18  stim  parietal -0.037970


.. GENERATED FROM PYTHON SOURCE LINES 50-55

From this information, we can infer that it is an fMRI study in which there
were several subjects, timepoints, events and signal extracted from several
brain regions.

Let's check how many kinds of each we have.

.. GENERATED FROM PYTHON SOURCE LINES 55-61

.. code-block:: Python


    print(df_fmri["event"].unique())
    print(df_fmri["region"].unique())
    print(sorted(df_fmri["timepoint"].unique()))
    print(df_fmri["subject"].unique())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ['stim' 'cue']
    ['parietal' 'frontal']
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
    ['s13' 's5' 's12' 's11' 's10' 's9' 's8' 's7' 's6' 's4' 's3' 's2' 's1' 's0']


.. GENERATED FROM PYTHON SOURCE LINES 62-65

We have data from parietal and frontal regions during 2 types of events
(*cue* and *stim*) during 18 timepoints and for 14 subjects.
Let's see how many samples we have for each condition

.. GENERATED FROM PYTHON SOURCE LINES 65-75

.. code-block:: Python


    print(df_fmri.groupby(["subject", "timepoint", "event", "region"]).count())
    print(
        np.unique(
            df_fmri.groupby(["subject", "timepoint", "event", "region"])
            .count()
            .values
        )
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                                      signal
    subject timepoint event region          
    s0      0         cue   frontal        1
                            parietal       1
                      stim  frontal        1
                            parietal       1
            1         cue   frontal        1
    ...                                  ...
    s9      17        stim  parietal       1
            18        cue   frontal        1
                            parietal       1
                      stim  frontal        1
                            parietal       1

    [1064 rows x 1 columns]
    [1]


.. GENERATED FROM PYTHON SOURCE LINES 76-82

We have exactly one value per condition.

Let's try to build a model, that uses parietal and frontal signal to predicts
whether the event was a *cue* or a *stim*.

First we define our X and y variables.

.. GENERATED FROM PYTHON SOURCE LINES 82-85

.. code-block:: Python

    X = ["parietal", "frontal"]
    y = "event"


.. GENERATED FROM PYTHON SOURCE LINES 86-91

In order for this to work, both *parietal* and *frontal* must be columns.
We need to *pivot* the table.

The values of *region* will be the columns. The column *signal* will be the
values. And the columns *subject*, *timepoint* and *event* will be the index

.. GENERATED FROM PYTHON SOURCE LINES 91-97

.. code-block:: Python

    df_fmri = df_fmri.pivot(
        index=["subject", "timepoint", "event"], columns="region", values="signal"
    )

    df_fmri = df_fmri.reset_index()


.. GENERATED FROM PYTHON SOURCE LINES 98-100

Here we want to zscore all the features and then train a Support Vector
Machine classifier.

.. GENERATED FROM PYTHON SOURCE LINES 100-112

.. code-block:: Python


    scores = run_cross_validation(
        X=X,
        y=y,
        data=df_fmri,
        preprocess="zscore",
        model="svm",
        problem_type="classification",
    )

    print(scores["test_score"].mean())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:54:02,955 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:54:02,955 - julearn - INFO - Using dataframe as input
    2026-01-16 10:54:02,955 - julearn - INFO -      Features: ['parietal', 'frontal']
    2026-01-16 10:54:02,955 - julearn - INFO -      Target: event
    2026-01-16 10:54:02,955 - julearn - INFO -      Expanded features: ['parietal', 'frontal']
    2026-01-16 10:54:02,956 - julearn - INFO -      X_types:{}
    2026-01-16 10:54:02,956 - julearn - WARNING - The following columns are not defined in X_types: ['parietal', 'frontal']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['parietal', 'frontal']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:54:02,956 - julearn - INFO - ====================
    2026-01-16 10:54:02,957 - julearn - INFO - 
    2026-01-16 10:54:02,957 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:54:02,957 - julearn - INFO - Step added
    2026-01-16 10:54:02,957 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:54:02,957 - julearn - INFO - Step added
    2026-01-16 10:54:02,958 - julearn - INFO - = Model Parameters =
    2026-01-16 10:54:02,958 - julearn - INFO - ====================
    2026-01-16 10:54:02,958 - julearn - INFO - 
    2026-01-16 10:54:02,958 - julearn - INFO - = Data Information =
    2026-01-16 10:54:02,958 - julearn - INFO -      Problem type: classification
    2026-01-16 10:54:02,958 - julearn - INFO -      Number of samples: 532
    2026-01-16 10:54:02,958 - julearn - INFO -      Number of features: 2
    2026-01-16 10:54:02,959 - julearn - INFO - ====================
    2026-01-16 10:54:02,959 - julearn - INFO - 
    2026-01-16 10:54:02,959 - julearn - INFO -      Number of classes: 2
    2026-01-16 10:54:02,959 - julearn - INFO -      Target type: object
    2026-01-16 10:54:02,960 - julearn - INFO -      Class distributions: event
    cue     266
    stim    266
    Name: count, dtype: int64
    2026-01-16 10:54:02,960 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
    2026-01-16 10:54:02,960 - julearn - INFO - Binary classification problem detected.
    0.7218303650149884


.. GENERATED FROM PYTHON SOURCE LINES 113-134

This results indicate that we can decode the kind of event by looking at
the *parietal* and *frontal* signal. However, that claim is true only if we
have some data from the same subject already acquired.

The problem is that we split the data randomly into 5 folds (default, see
:func:`.run_cross_validation`). This means that data from one subject could
be both in the training and the testing set. If this is the case, then the
model can learn the subjects' specific characteristics and apply it to the
testing set. Thus, it is not true that we can decode it for an unseen
subject, but for an unseen timepoint for a subject that for whom we already
have data.

To test for unseen subject, we need to make sure that all the data from each
subject is either on the training or the testing set, but not in both.

We can use ``scikit-learn``'s
:class:`sklearn.model_selection.GroupShuffleSplit` and specify which is the
grouping column using the ``group`` parameter.
By setting ``return_estimator="final"``, the :func:`.run_cross_validation`
function returns the estimator fitted with all the data. We will use this
later to do some analyses.

.. GENERATED FROM PYTHON SOURCE LINES 134-150

.. code-block:: Python

    cv = GroupShuffleSplit(n_splits=5, test_size=0.5, random_state=42)

    scores, model = run_cross_validation(
        X=X,
        y=y,
        data=df_fmri,
        preprocess="zscore",
        model="svm",
        cv=cv,
        groups="subject",
        problem_type="classification",
        return_estimator="final",
    )

    print(scores["test_score"].mean())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:54:03,029 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:54:03,029 - julearn - INFO - Using dataframe as input
    2026-01-16 10:54:03,029 - julearn - INFO -      Features: ['parietal', 'frontal']
    2026-01-16 10:54:03,030 - julearn - INFO -      Target: event
    2026-01-16 10:54:03,030 - julearn - INFO -      Expanded features: ['parietal', 'frontal']
    2026-01-16 10:54:03,030 - julearn - INFO -      X_types:{}
    2026-01-16 10:54:03,030 - julearn - WARNING - The following columns are not defined in X_types: ['parietal', 'frontal']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['parietal', 'frontal']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:54:03,031 - julearn - INFO - Using subject as groups
    2026-01-16 10:54:03,031 - julearn - INFO - ====================
    2026-01-16 10:54:03,031 - julearn - INFO - 
    2026-01-16 10:54:03,031 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:54:03,031 - julearn - INFO - Step added
    2026-01-16 10:54:03,031 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:54:03,032 - julearn - INFO - Step added
    2026-01-16 10:54:03,032 - julearn - INFO - = Model Parameters =
    2026-01-16 10:54:03,032 - julearn - INFO - ====================
    2026-01-16 10:54:03,032 - julearn - INFO - 
    2026-01-16 10:54:03,033 - julearn - INFO - = Data Information =
    2026-01-16 10:54:03,033 - julearn - INFO -      Problem type: classification
    2026-01-16 10:54:03,033 - julearn - INFO -      Number of samples: 532
    2026-01-16 10:54:03,033 - julearn - INFO -      Number of features: 2
    2026-01-16 10:54:03,033 - julearn - INFO - ====================
    2026-01-16 10:54:03,033 - julearn - INFO - 
    2026-01-16 10:54:03,033 - julearn - INFO -      Number of classes: 2
    2026-01-16 10:54:03,033 - julearn - INFO -      Target type: object
    2026-01-16 10:54:03,034 - julearn - INFO -      Class distributions: event
    cue     266
    stim    266
    Name: count, dtype: int64
    2026-01-16 10:54:03,034 - julearn - INFO - Using outer CV scheme GroupShuffleSplit(n_splits=5, random_state=42, test_size=0.5, train_size=None) (incl. final model)
    2026-01-16 10:54:03,034 - julearn - INFO - Binary classification problem detected.
    0.7210526315789474


.. GENERATED FROM PYTHON SOURCE LINES 151-156

After testing on independent subjects, we can now claim that given a new
subject, we can predict the kind of event.

Let's do some visualization on how these two features interact and what
the preprocessing part of the model does.

.. GENERATED FROM PYTHON SOURCE LINES 156-182

.. code-block:: Python


    # Plot the raw features
    fig, axes = plt.subplots(1, 2, figsize=(8, 4))
    sns.scatterplot(
        x="parietal", y="frontal", hue="event", data=df_fmri, ax=axes[0], s=5
    )
    axes[0].set_title("Raw data")

    # Plot the preprocessed features
    pre_X = preprocess(
        model, X=X, data=df_fmri, until="zscore", with_column_types=True
    )

    pre_df = pre_X.join(df_fmri[y])

    sns.scatterplot(
        x="parietal__:type:__continuous",
        y="frontal__:type:__continuous",
        hue="event",
        data=pre_df,
        ax=axes[1],
        s=5,
    )

    axes[1].set_title("Preprocessed data")


.. image-sg:: /auto_examples/02_inspection/images/sphx_glr_plot_groupcv_inspect_svm_001.png
   :alt: Raw data, Preprocessed data
   :srcset: /auto_examples/02_inspection/images/sphx_glr_plot_groupcv_inspect_svm_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Text(0.5, 1.0, 'Preprocessed data')


.. GENERATED FROM PYTHON SOURCE LINES 183-188

In this case, the preprocessing is nothing more than a
:class:`sklearn.preprocessing.StandardScaler`.

It seems that the data is not quite linearly separable. Let's now visualize
how the SVM does this complex task.

.. GENERATED FROM PYTHON SOURCE LINES 188-218

.. code-block:: Python


    # Get the model from the pipeline
    clf = model[2]
    fig = plt.figure()
    ax = sns.scatterplot(
        x="parietal__:type:__continuous",
        y="frontal__:type:__continuous",
        hue="event",
        data=pre_df,
        s=5,
    )

    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    # Create grid to evaluate model
    xx = np.linspace(xlim[0], xlim[1], 30)
    yy = np.linspace(ylim[0], ylim[1], 30)
    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T

    # Create pandas.DataFrame
    xy_df = pd.DataFrame(
        data=xy,
        columns=["parietal__:type:__continuous", "frontal__:type:__continuous"],
    )

    Z = clf.decision_function(xy_df).reshape(XX.shape)
    a = ax.contour(XX, YY, Z, colors="k", levels=[0], alpha=0.5, linestyles=["-"])
    ax.set_title("Preprocessed data with SVM decision function boundaries")


.. image-sg:: /auto_examples/02_inspection/images/sphx_glr_plot_groupcv_inspect_svm_002.png
   :alt: Preprocessed data with SVM decision function boundaries
   :srcset: /auto_examples/02_inspection/images/sphx_glr_plot_groupcv_inspect_svm_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Text(0.5, 1.0, 'Preprocessed data with SVM decision function boundaries')


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.491 seconds)


.. _sphx_glr_download_auto_examples_02_inspection_plot_groupcv_inspect_svm.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_groupcv_inspect_svm.ipynb <plot_groupcv_inspect_svm.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_groupcv_inspect_svm.py <plot_groupcv_inspect_svm.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_groupcv_inspect_svm.zip <plot_groupcv_inspect_svm.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_