Tuning Hyperparameters#

This example uses the fmri dataset, performs simple binary classification using a Support Vector Machine classifier and analyze the model.

References#

Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of cognitive control in context-dependent decision-making. Cerebral Cortex.

# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL

import numpy as np
from seaborn import load_dataset

from julearn import run_cross_validation
from julearn.utils import configure_logging
from julearn.pipeline import PipelineCreator

Set the logging level to info to see extra information.

configure_logging(level="INFO")

2024-05-03 15:22:04,235 - julearn - INFO - ===== Lib Versions =====
2024-05-03 15:22:04,235 - julearn - INFO - numpy: 1.26.4
2024-05-03 15:22:04,235 - julearn - INFO - scipy: 1.13.0
2024-05-03 15:22:04,236 - julearn - INFO - sklearn: 1.4.2
2024-05-03 15:22:04,236 - julearn - INFO - pandas: 2.1.4
2024-05-03 15:22:04,236 - julearn - INFO - julearn: 0.3.2
2024-05-03 15:22:04,236 - julearn - INFO - ========================

Set the random seed to always have the same example.

np.random.seed(42)

Load the dataset.

df_fmri = load_dataset("fmri")
df_fmri.head()

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970

Set the dataframe in the right format.

df_fmri = df_fmri.pivot(
    index=["subject", "timepoint", "event"], columns="region", values="signal"
)

df_fmri = df_fmri.reset_index()
df_fmri.head()

region	subject	timepoint	event	frontal	parietal
0	s0	0	cue	0.007766	-0.006899
1	s0	0	stim	-0.021452	-0.039327
2	s0	1	cue	0.016440	0.000300
3	s0	1	stim	-0.021054	-0.035735
4	s0	2	cue	0.024296	0.033220

Let’s do a first attempt and use a linear SVM with the default parameters.

X = ["frontal", "parietal"]
y = "event"

creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="linear")

scores = run_cross_validation(X=X, y=y, data=df_fmri, model=creator)

print(scores["test_score"].mean())

2024-05-03 15:22:04,244 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,244 - julearn - INFO - Step added
2024-05-03 15:22:04,245 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,245 - julearn - INFO - Setting hyperparameter kernel = linear
2024-05-03 15:22:04,245 - julearn - INFO - Step added
2024-05-03 15:22:04,245 - julearn - INFO - ==== Input Data ====
2024-05-03 15:22:04,245 - julearn - INFO - Using dataframe as input
2024-05-03 15:22:04,245 - julearn - INFO -      Features: ['frontal', 'parietal']
2024-05-03 15:22:04,245 - julearn - INFO -      Target: event
2024-05-03 15:22:04,245 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2024-05-03 15:22:04,245 - julearn - INFO -      X_types:{}
2024-05-03 15:22:04,245 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2024-05-03 15:22:04,246 - julearn - INFO - ====================
2024-05-03 15:22:04,246 - julearn - INFO -
2024-05-03 15:22:04,246 - julearn - INFO - = Model Parameters =
2024-05-03 15:22:04,246 - julearn - INFO - ====================
2024-05-03 15:22:04,246 - julearn - INFO -
2024-05-03 15:22:04,246 - julearn - INFO - = Data Information =
2024-05-03 15:22:04,247 - julearn - INFO -      Problem type: classification
2024-05-03 15:22:04,247 - julearn - INFO -      Number of samples: 532
2024-05-03 15:22:04,247 - julearn - INFO -      Number of features: 2
2024-05-03 15:22:04,247 - julearn - INFO - ====================
2024-05-03 15:22:04,247 - julearn - INFO -
2024-05-03 15:22:04,247 - julearn - INFO -      Number of classes: 2
2024-05-03 15:22:04,247 - julearn - INFO -      Target type: object
2024-05-03 15:22:04,247 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2024-05-03 15:22:04,248 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-03 15:22:04,248 - julearn - INFO - Binary classification problem detected.
0.5939164168576971

The score is not so good. Let’s try to see if there is an optimal regularization parameter (C) for the linear SVM. We will use a grid search to find the best C.

creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="linear", C=[0.01, 0.1])

search_params = {
    "kind": "grid",
    "cv": 2,  # to speed up the example
}

scores, estimator = run_cross_validation(
    X=X,
    y=y,
    data=df_fmri,
    model=creator,
    search_params=search_params,
    return_estimator="final",
)

print(scores["test_score"].mean())

2024-05-03 15:22:04,303 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,303 - julearn - INFO - Step added
2024-05-03 15:22:04,303 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,303 - julearn - INFO - Setting hyperparameter kernel = linear
2024-05-03 15:22:04,303 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-03 15:22:04,303 - julearn - INFO - Step added
2024-05-03 15:22:04,303 - julearn - INFO - ==== Input Data ====
2024-05-03 15:22:04,303 - julearn - INFO - Using dataframe as input
2024-05-03 15:22:04,303 - julearn - INFO -      Features: ['frontal', 'parietal']
2024-05-03 15:22:04,304 - julearn - INFO -      Target: event
2024-05-03 15:22:04,304 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2024-05-03 15:22:04,304 - julearn - INFO -      X_types:{}
2024-05-03 15:22:04,304 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2024-05-03 15:22:04,304 - julearn - INFO - ====================
2024-05-03 15:22:04,304 - julearn - INFO -
2024-05-03 15:22:04,305 - julearn - INFO - = Model Parameters =
2024-05-03 15:22:04,305 - julearn - INFO - Tuning hyperparameters using grid
2024-05-03 15:22:04,305 - julearn - INFO - Hyperparameters:
2024-05-03 15:22:04,305 - julearn - INFO -      svm__C: [0.01, 0.1]
2024-05-03 15:22:04,305 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:04,305 - julearn - INFO - Search Parameters:
2024-05-03 15:22:04,305 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:04,305 - julearn - INFO - ====================
2024-05-03 15:22:04,305 - julearn - INFO -
2024-05-03 15:22:04,305 - julearn - INFO - = Data Information =
2024-05-03 15:22:04,306 - julearn - INFO -      Problem type: classification
2024-05-03 15:22:04,306 - julearn - INFO -      Number of samples: 532
2024-05-03 15:22:04,306 - julearn - INFO -      Number of features: 2
2024-05-03 15:22:04,306 - julearn - INFO - ====================
2024-05-03 15:22:04,306 - julearn - INFO -
2024-05-03 15:22:04,306 - julearn - INFO -      Number of classes: 2
2024-05-03 15:22:04,306 - julearn - INFO -      Target type: object
2024-05-03 15:22:04,306 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2024-05-03 15:22:04,307 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-03 15:22:04,307 - julearn - INFO - Binary classification problem detected.
2024-05-03 15:22:04,540 - julearn - INFO - Fitting final model
0.588308940222183

This did not change much, lets explore other kernels too.

creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel=["linear", "rbf", "poly"], C=[0.01, 0.1])

scores, estimator = run_cross_validation(
    X=X,
    y=y,
    data=df_fmri,
    model=creator,
    search_params=search_params,
    return_estimator="final",
)

print(scores["test_score"].mean())

2024-05-03 15:22:04,586 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,587 - julearn - INFO - Step added
2024-05-03 15:22:04,587 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:04,587 - julearn - INFO - Tuning hyperparameter kernel = ['linear', 'rbf', 'poly']
2024-05-03 15:22:04,587 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-03 15:22:04,587 - julearn - INFO - Step added
2024-05-03 15:22:04,587 - julearn - INFO - ==== Input Data ====
2024-05-03 15:22:04,587 - julearn - INFO - Using dataframe as input
2024-05-03 15:22:04,587 - julearn - INFO -      Features: ['frontal', 'parietal']
2024-05-03 15:22:04,587 - julearn - INFO -      Target: event
2024-05-03 15:22:04,587 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2024-05-03 15:22:04,587 - julearn - INFO -      X_types:{}
2024-05-03 15:22:04,587 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2024-05-03 15:22:04,588 - julearn - INFO - ====================
2024-05-03 15:22:04,588 - julearn - INFO -
2024-05-03 15:22:04,588 - julearn - INFO - = Model Parameters =
2024-05-03 15:22:04,589 - julearn - INFO - Tuning hyperparameters using grid
2024-05-03 15:22:04,589 - julearn - INFO - Hyperparameters:
2024-05-03 15:22:04,589 - julearn - INFO -      svm__kernel: ['linear', 'rbf', 'poly']
2024-05-03 15:22:04,589 - julearn - INFO -      svm__C: [0.01, 0.1]
2024-05-03 15:22:04,589 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:04,589 - julearn - INFO - Search Parameters:
2024-05-03 15:22:04,589 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:04,589 - julearn - INFO - ====================
2024-05-03 15:22:04,589 - julearn - INFO -
2024-05-03 15:22:04,589 - julearn - INFO - = Data Information =
2024-05-03 15:22:04,589 - julearn - INFO -      Problem type: classification
2024-05-03 15:22:04,589 - julearn - INFO -      Number of samples: 532
2024-05-03 15:22:04,589 - julearn - INFO -      Number of features: 2
2024-05-03 15:22:04,589 - julearn - INFO - ====================
2024-05-03 15:22:04,589 - julearn - INFO -
2024-05-03 15:22:04,590 - julearn - INFO -      Number of classes: 2
2024-05-03 15:22:04,590 - julearn - INFO -      Target type: object
2024-05-03 15:22:04,590 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2024-05-03 15:22:04,590 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-03 15:22:04,590 - julearn - INFO - Binary classification problem detected.
2024-05-03 15:22:05,187 - julearn - INFO - Fitting final model
0.7087109857168048

It seems that we might have found a better model, but which one is it?

print(estimator.best_params_)

{'svm__C': 0.1, 'svm__kernel': 'rbf'}

Now that we know that a RBF kernel is better, lest test different gamma parameters.

creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="rbf", C=[0.01, 0.1], gamma=[1e-2, 1e-3])

scores, estimator = run_cross_validation(
    X=X,
    y=y,
    data=df_fmri,
    model=creator,
    search_params=search_params,
    return_estimator="final",
)

print(scores["test_score"].mean())
print(estimator.best_params_)

2024-05-03 15:22:05,314 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:05,314 - julearn - INFO - Step added
2024-05-03 15:22:05,314 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:05,314 - julearn - INFO - Setting hyperparameter kernel = rbf
2024-05-03 15:22:05,314 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-03 15:22:05,315 - julearn - INFO - Tuning hyperparameter gamma = [0.01, 0.001]
2024-05-03 15:22:05,315 - julearn - INFO - Step added
2024-05-03 15:22:05,315 - julearn - INFO - ==== Input Data ====
2024-05-03 15:22:05,315 - julearn - INFO - Using dataframe as input
2024-05-03 15:22:05,315 - julearn - INFO -      Features: ['frontal', 'parietal']
2024-05-03 15:22:05,315 - julearn - INFO -      Target: event
2024-05-03 15:22:05,315 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2024-05-03 15:22:05,315 - julearn - INFO -      X_types:{}
2024-05-03 15:22:05,315 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2024-05-03 15:22:05,316 - julearn - INFO - ====================
2024-05-03 15:22:05,316 - julearn - INFO -
2024-05-03 15:22:05,316 - julearn - INFO - = Model Parameters =
2024-05-03 15:22:05,316 - julearn - INFO - Tuning hyperparameters using grid
2024-05-03 15:22:05,316 - julearn - INFO - Hyperparameters:
2024-05-03 15:22:05,316 - julearn - INFO -      svm__C: [0.01, 0.1]
2024-05-03 15:22:05,316 - julearn - INFO -      svm__gamma: [0.01, 0.001]
2024-05-03 15:22:05,317 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:05,317 - julearn - INFO - Search Parameters:
2024-05-03 15:22:05,317 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:05,317 - julearn - INFO - ====================
2024-05-03 15:22:05,317 - julearn - INFO -
2024-05-03 15:22:05,317 - julearn - INFO - = Data Information =
2024-05-03 15:22:05,317 - julearn - INFO -      Problem type: classification
2024-05-03 15:22:05,317 - julearn - INFO -      Number of samples: 532
2024-05-03 15:22:05,317 - julearn - INFO -      Number of features: 2
2024-05-03 15:22:05,317 - julearn - INFO - ====================
2024-05-03 15:22:05,317 - julearn - INFO -
2024-05-03 15:22:05,317 - julearn - INFO -      Number of classes: 2
2024-05-03 15:22:05,317 - julearn - INFO -      Target type: object
2024-05-03 15:22:05,318 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2024-05-03 15:22:05,318 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-03 15:22:05,318 - julearn - INFO - Binary classification problem detected.
2024-05-03 15:22:05,772 - julearn - INFO - Fitting final model
0.5188855581026275
{'svm__C': 0.01, 'svm__gamma': 0.001}

It seems that without tuning the gamma parameter we had a better accuracy. Let’s add the default value and see what happens.

creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="rbf", C=[0.01, 0.1], gamma=[1e-2, 1e-3, "scale"])
X = ["frontal", "parietal"]
y = "event"

search_params = {"cv": 2}

scores, estimator = run_cross_validation(
    X=X,
    y=y,
    data=df_fmri,
    model=creator,
    return_estimator="final",
    search_params=search_params,
)

print(scores["test_score"].mean())
print(estimator.best_params_)

2024-05-03 15:22:05,870 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:05,870 - julearn - INFO - Step added
2024-05-03 15:22:05,870 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-03 15:22:05,870 - julearn - INFO - Setting hyperparameter kernel = rbf
2024-05-03 15:22:05,870 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-03 15:22:05,871 - julearn - INFO - Tuning hyperparameter gamma = [0.01, 0.001, 'scale']
2024-05-03 15:22:05,871 - julearn - INFO - Step added
2024-05-03 15:22:05,871 - julearn - INFO - ==== Input Data ====
2024-05-03 15:22:05,871 - julearn - INFO - Using dataframe as input
2024-05-03 15:22:05,871 - julearn - INFO -      Features: ['frontal', 'parietal']
2024-05-03 15:22:05,871 - julearn - INFO -      Target: event
2024-05-03 15:22:05,871 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2024-05-03 15:22:05,871 - julearn - INFO -      X_types:{}
2024-05-03 15:22:05,871 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2024-05-03 15:22:05,872 - julearn - INFO - ====================
2024-05-03 15:22:05,872 - julearn - INFO -
2024-05-03 15:22:05,872 - julearn - INFO - = Model Parameters =
2024-05-03 15:22:05,872 - julearn - INFO - Tuning hyperparameters using grid
2024-05-03 15:22:05,872 - julearn - INFO - Hyperparameters:
2024-05-03 15:22:05,872 - julearn - INFO -      svm__C: [0.01, 0.1]
2024-05-03 15:22:05,872 - julearn - INFO -      svm__gamma: [0.01, 0.001, 'scale']
2024-05-03 15:22:05,873 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:05,873 - julearn - INFO - Search Parameters:
2024-05-03 15:22:05,873 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-03 15:22:05,873 - julearn - INFO - ====================
2024-05-03 15:22:05,873 - julearn - INFO -
2024-05-03 15:22:05,873 - julearn - INFO - = Data Information =
2024-05-03 15:22:05,873 - julearn - INFO -      Problem type: classification
2024-05-03 15:22:05,873 - julearn - INFO -      Number of samples: 532
2024-05-03 15:22:05,873 - julearn - INFO -      Number of features: 2
2024-05-03 15:22:05,873 - julearn - INFO - ====================
2024-05-03 15:22:05,873 - julearn - INFO -
2024-05-03 15:22:05,873 - julearn - INFO -      Number of classes: 2
2024-05-03 15:22:05,873 - julearn - INFO -      Target type: object
2024-05-03 15:22:05,874 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2024-05-03 15:22:05,874 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-03 15:22:05,874 - julearn - INFO - Binary classification problem detected.
2024-05-03 15:22:06,507 - julearn - INFO - Fitting final model
0.7087109857168048
{'svm__C': 0.1, 'svm__gamma': 'scale'}

print(estimator.best_estimator_["svm"]._gamma)

0.5

Total running time of the script: (0 minutes 2.416 seconds)

Gallery generated by Sphinx-Gallery