Tuning Hyperparameters using Bayesian Search¶

This example uses the fmri dataset, performs simple binary classification using a Support Vector Machine classifier and analyzes the model.

References¶

Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of cognitive control in context-dependent decision-making. Cerebral Cortex.

# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL

import numpy as np
from seaborn import load_dataset
import sklearn

from julearn import run_cross_validation
from julearn.utils import configure_logging, logger
from julearn.pipeline import PipelineCreator

Set the logging level to info to see extra information.

configure_logging(level="INFO")

2026-02-10 14:37:59,631 - julearn - INFO - ===== Lib Versions =====
2026-02-10 14:37:59,631 - julearn - INFO - numpy: 1.26.4
2026-02-10 14:37:59,631 - julearn - INFO - scipy: 1.17.0
2026-02-10 14:37:59,631 - julearn - INFO - sklearn: 1.7.2
2026-02-10 14:37:59,631 - julearn - INFO - pandas: 2.3.3
2026-02-10 14:37:59,631 - julearn - INFO - julearn: 0.3.5.dev126
2026-02-10 14:37:59,631 - julearn - INFO - ========================

Disable metadata routing to avoid errors due to BayesSearchCV being used.

sklearn.set_config(enable_metadata_routing=False)

Set the random seed to always have the same example.

np.random.seed(42)

Load the dataset.

df_fmri = load_dataset("fmri")
df_fmri.head()

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970

Set the dataframe in the right format.

df_fmri = df_fmri.pivot(
    index=["subject", "timepoint", "event"], columns="region", values="signal"
)

df_fmri = df_fmri.reset_index()
df_fmri.head()

region	subject	timepoint	event	frontal	parietal
0	s0	0	cue	0.007766	-0.006899
1	s0	0	stim	-0.021452	-0.039327
2	s0	1	cue	0.016440	0.000300
3	s0	1	stim	-0.021054	-0.035735
4	s0	2	cue	0.024296	0.033220

Following the hyperparamter tuning example, we will now use a Bayesian search to find the best hyperparameters for the SVM model.

X = ["frontal", "parietal"]
y = "event"

creator1 = PipelineCreator(problem_type="classification")
creator1.add("zscore")
creator1.add(
    "svm",
    kernel=["linear"],
    C=(1e-6, 1e3, "log-uniform"),
)

creator2 = PipelineCreator(problem_type="classification")
creator2.add("zscore")
creator2.add(
    "svm",
    kernel=["rbf"],
    C=(1e-6, 1e3, "log-uniform"),
    gamma=(1e-6, 1e1, "log-uniform"),
)

search_params = {
    "kind": "bayes",
    "cv": 2,  # to speed up the example
    "n_iter": 10,  # 10 iterations of bayesian search to speed up example
}


scores, estimator = run_cross_validation(
    X=X,
    y=y,
    data=df_fmri,
    model=[creator1, creator2],
    cv=2,  # to speed up the example
    search_params=search_params,
    return_estimator="final",
)

print(scores["test_score"].mean())

2026-02-10 14:37:59,640 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-02-10 14:37:59,640 - julearn - INFO - Step added
2026-02-10 14:37:59,640 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-02-10 14:37:59,641 - julearn - INFO - Setting hyperparameter kernel = linear
2026-02-10 14:37:59,641 - julearn - INFO - Tuning hyperparameter C = (1e-06, 1000.0, 'log-uniform')
2026-02-10 14:37:59,641 - julearn - INFO - Step added
2026-02-10 14:37:59,641 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-02-10 14:37:59,641 - julearn - INFO - Step added
2026-02-10 14:37:59,641 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-02-10 14:37:59,641 - julearn - INFO - Setting hyperparameter kernel = rbf
2026-02-10 14:37:59,641 - julearn - INFO - Tuning hyperparameter C = (1e-06, 1000.0, 'log-uniform')
2026-02-10 14:37:59,642 - julearn - INFO - Tuning hyperparameter gamma = (1e-06, 10.0, 'log-uniform')
2026-02-10 14:37:59,642 - julearn - INFO - Step added
2026-02-10 14:37:59,642 - julearn - INFO - ==== Input Data ====
2026-02-10 14:37:59,642 - julearn - INFO - Using dataframe as input
2026-02-10 14:37:59,642 - julearn - INFO -      Features: ['frontal', 'parietal']
2026-02-10 14:37:59,642 - julearn - INFO -      Target: event
2026-02-10 14:37:59,642 - julearn - INFO -      Expanded features: ['frontal', 'parietal']
2026-02-10 14:37:59,642 - julearn - INFO -      X_types:{}
2026-02-10 14:37:59,642 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:581: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
  warn_with_log(
2026-02-10 14:37:59,643 - julearn - INFO - ====================
2026-02-10 14:37:59,643 - julearn - INFO -
2026-02-10 14:37:59,644 - julearn - INFO - = Model Parameters =
2026-02-10 14:37:59,644 - julearn - INFO - Tuning hyperparameters using bayes
2026-02-10 14:37:59,644 - julearn - INFO - Hyperparameters:
2026-02-10 14:37:59,644 - julearn - INFO -      svm__C: (1e-06, 1000.0, 'log-uniform')
2026-02-10 14:37:59,644 - julearn - INFO - Hyperparameter svm__C is log-uniform float [1e-06, 1000.0]
2026-02-10 14:37:59,645 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,645 - julearn - INFO - Search Parameters:
2026-02-10 14:37:59,645 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,645 - julearn - INFO -      n_iter: 10
2026-02-10 14:37:59,646 - julearn - INFO - ====================
2026-02-10 14:37:59,646 - julearn - INFO -
2026-02-10 14:37:59,646 - julearn - INFO - = Model Parameters =
2026-02-10 14:37:59,646 - julearn - INFO - Tuning hyperparameters using bayes
2026-02-10 14:37:59,646 - julearn - INFO - Hyperparameters:
2026-02-10 14:37:59,647 - julearn - INFO -      svm__C: (1e-06, 1000.0, 'log-uniform')
2026-02-10 14:37:59,647 - julearn - INFO -      svm__gamma: (1e-06, 10.0, 'log-uniform')
2026-02-10 14:37:59,647 - julearn - INFO - Hyperparameter svm__C is log-uniform float [1e-06, 1000.0]
2026-02-10 14:37:59,647 - julearn - INFO - Hyperparameter svm__gamma is log-uniform float [1e-06, 10.0]
2026-02-10 14:37:59,648 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,648 - julearn - INFO - Search Parameters:
2026-02-10 14:37:59,648 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,649 - julearn - INFO -      n_iter: 10
2026-02-10 14:37:59,649 - julearn - INFO - ====================
2026-02-10 14:37:59,649 - julearn - INFO -
2026-02-10 14:37:59,649 - julearn - INFO - = Model Parameters =
2026-02-10 14:37:59,649 - julearn - INFO - Tuning hyperparameters using bayes
2026-02-10 14:37:59,649 - julearn - INFO - Hyperparameters list:
2026-02-10 14:37:59,649 - julearn - INFO -      Set 0
2026-02-10 14:37:59,650 - julearn - INFO -              svm__C: Real(low=1e-06, high=1000.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,650 - julearn - INFO -              set_column_types: [SetColumnTypes(X_types={})]
2026-02-10 14:37:59,650 - julearn - INFO -              zscore: [StandardScaler()]
2026-02-10 14:37:59,650 - julearn - INFO -              svm: [SVC(kernel='linear')]
2026-02-10 14:37:59,650 - julearn - INFO -      Set 1
2026-02-10 14:37:59,650 - julearn - INFO -              svm__C: Real(low=1e-06, high=1000.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,650 - julearn - INFO -              svm__gamma: Real(low=1e-06, high=10.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,651 - julearn - INFO -              set_column_types: [SetColumnTypes(X_types={})]
2026-02-10 14:37:59,651 - julearn - INFO -              zscore: [StandardScaler()]
2026-02-10 14:37:59,651 - julearn - INFO -              svm: [SVC()]
2026-02-10 14:37:59,651 - julearn - INFO - Hyperparameter svm__C as is Real(low=1e-06, high=1000.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,651 - julearn - INFO - Hyperparameter set_column_types as is [SetColumnTypes(X_types={})]
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter zscore as is [StandardScaler()]
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter svm as is [SVC(kernel='linear')]
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter svm__C as is Real(low=1e-06, high=1000.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter svm__gamma as is Real(low=1e-06, high=10.0, prior='log-uniform', transform='identity')
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter set_column_types as is [SetColumnTypes(X_types={})]
2026-02-10 14:37:59,652 - julearn - INFO - Hyperparameter zscore as is [StandardScaler()]
2026-02-10 14:37:59,653 - julearn - INFO - Hyperparameter svm as is [SVC()]
2026-02-10 14:37:59,653 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,653 - julearn - INFO - Search Parameters:
2026-02-10 14:37:59,653 - julearn - INFO -      cv: KFold(n_splits=2, random_state=None, shuffle=False)
2026-02-10 14:37:59,653 - julearn - INFO -      n_iter: 10
2026-02-10 14:37:59,663 - julearn - INFO - ====================
2026-02-10 14:37:59,663 - julearn - INFO -
2026-02-10 14:37:59,664 - julearn - INFO - = Data Information =
2026-02-10 14:37:59,664 - julearn - INFO -      Problem type: classification
2026-02-10 14:37:59,664 - julearn - INFO -      Number of samples: 532
2026-02-10 14:37:59,664 - julearn - INFO -      Number of features: 2
2026-02-10 14:37:59,664 - julearn - INFO - ====================
2026-02-10 14:37:59,664 - julearn - INFO -
2026-02-10 14:37:59,664 - julearn - INFO -      Number of classes: 2
2026-02-10 14:37:59,664 - julearn - INFO -      Target type: object
2026-02-10 14:37:59,665 - julearn - INFO -      Class distributions: event
cue     266
stim    266
Name: count, dtype: int64
2026-02-10 14:37:59,665 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
2026-02-10 14:37:59,666 - julearn - INFO - Binary classification problem detected.
0.656015037593985

It seems that we might have found a better model, but which one is it?

print(estimator.best_params_)

OrderedDict({'set_column_types': SetColumnTypes(X_types={}), 'svm': SVC(), 'svm__C': 0.0018082604408073564, 'svm__gamma': 1.6437581151471767, 'zscore': StandardScaler()})

Total running time of the script: (0 minutes 4.028 seconds)

Gallery generated by Sphinx-Gallery