Note
Go to the end to download the full example code
Tuning Multiple Hyperparameters Grids#
This example uses the fmri
dataset, performs simple binary classification
using a Support Vector Machine classifier while tuning multiple hyperparameters
grids at the same time.
References#
Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of cognitive control in context-dependent decision-making. Cerebral Cortex.
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL
import numpy as np
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.utils import configure_logging
from julearn.pipeline import PipelineCreator
Set the logging level to info to see extra information.
configure_logging(level="INFO")
2024-05-16 08:52:38,314 - julearn - INFO - ===== Lib Versions =====
2024-05-16 08:52:38,315 - julearn - INFO - numpy: 1.26.4
2024-05-16 08:52:38,315 - julearn - INFO - scipy: 1.13.0
2024-05-16 08:52:38,315 - julearn - INFO - sklearn: 1.4.2
2024-05-16 08:52:38,315 - julearn - INFO - pandas: 2.1.4
2024-05-16 08:52:38,315 - julearn - INFO - julearn: 0.3.3
2024-05-16 08:52:38,315 - julearn - INFO - ========================
Set the random seed to always have the same example.
np.random.seed(42)
Load the dataset.
df_fmri = load_dataset("fmri")
df_fmri.head()
Set the dataframe in the right format.
df_fmri = df_fmri.pivot(
index=["subject", "timepoint", "event"], columns="region", values="signal"
)
df_fmri = df_fmri.reset_index()
df_fmri.head()
Lets do a first attempt and use a linear SVM with the default parameters.
2024-05-16 08:52:38,324 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-16 08:52:38,324 - julearn - INFO - Step added
2024-05-16 08:52:38,324 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-16 08:52:38,324 - julearn - INFO - Setting hyperparameter kernel = linear
2024-05-16 08:52:38,324 - julearn - INFO - Step added
2024-05-16 08:52:38,324 - julearn - INFO - ==== Input Data ====
2024-05-16 08:52:38,324 - julearn - INFO - Using dataframe as input
2024-05-16 08:52:38,324 - julearn - INFO - Features: ['frontal', 'parietal']
2024-05-16 08:52:38,324 - julearn - INFO - Target: event
2024-05-16 08:52:38,324 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-05-16 08:52:38,324 - julearn - INFO - X_types:{}
2024-05-16 08:52:38,324 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-05-16 08:52:38,325 - julearn - INFO - ====================
2024-05-16 08:52:38,325 - julearn - INFO -
2024-05-16 08:52:38,326 - julearn - INFO - = Model Parameters =
2024-05-16 08:52:38,326 - julearn - INFO - ====================
2024-05-16 08:52:38,326 - julearn - INFO -
2024-05-16 08:52:38,326 - julearn - INFO - = Data Information =
2024-05-16 08:52:38,326 - julearn - INFO - Problem type: classification
2024-05-16 08:52:38,326 - julearn - INFO - Number of samples: 532
2024-05-16 08:52:38,326 - julearn - INFO - Number of features: 2
2024-05-16 08:52:38,326 - julearn - INFO - ====================
2024-05-16 08:52:38,326 - julearn - INFO -
2024-05-16 08:52:38,326 - julearn - INFO - Number of classes: 2
2024-05-16 08:52:38,326 - julearn - INFO - Target type: object
2024-05-16 08:52:38,327 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-05-16 08:52:38,327 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-16 08:52:38,327 - julearn - INFO - Binary classification problem detected.
0.5939164168576971
Now let’s tune a bit this SVM. We will use a grid search to tune the
regularization parameter C
and the kernel. We will also tune the gamma
.
But since the gamma
is only used for the rbf kernel, we will use a
different grid for the "rbf"
kernel.
To specify two different sets of parameters for the same step, we can
explicitly specify the name of the step. This is done by passing the
name
parameter to the add
method.
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="linear", C=[0.01, 0.1], name="svm")
creator.add(
"svm",
kernel="rbf",
C=[0.01, 0.1],
gamma=["scale", "auto", 1e-2, 1e-3],
name="svm",
)
search_params = {
"kind": "grid",
"cv": 2, # to speed up the example
}
scores, estimator = run_cross_validation(
X=X,
y=y,
data=df_fmri,
model=creator,
search_params=search_params,
return_estimator="final",
)
print(scores["test_score"].mean())
2024-05-16 08:52:38,382 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-16 08:52:38,383 - julearn - INFO - Step added
2024-05-16 08:52:38,383 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-16 08:52:38,383 - julearn - INFO - Setting hyperparameter kernel = linear
2024-05-16 08:52:38,383 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-16 08:52:38,383 - julearn - INFO - Step added
2024-05-16 08:52:38,383 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-05-16 08:52:38,383 - julearn - INFO - Setting hyperparameter kernel = rbf
2024-05-16 08:52:38,383 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-05-16 08:52:38,383 - julearn - INFO - Tuning hyperparameter gamma = ['scale', 'auto', 0.01, 0.001]
2024-05-16 08:52:38,383 - julearn - INFO - Step added
2024-05-16 08:52:38,383 - julearn - INFO - ==== Input Data ====
2024-05-16 08:52:38,383 - julearn - INFO - Using dataframe as input
2024-05-16 08:52:38,383 - julearn - INFO - Features: ['frontal', 'parietal']
2024-05-16 08:52:38,383 - julearn - INFO - Target: event
2024-05-16 08:52:38,383 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-05-16 08:52:38,383 - julearn - INFO - X_types:{}
2024-05-16 08:52:38,384 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:505: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-05-16 08:52:38,384 - julearn - INFO - ====================
2024-05-16 08:52:38,384 - julearn - INFO -
2024-05-16 08:52:38,385 - julearn - INFO - = Model Parameters =
2024-05-16 08:52:38,385 - julearn - INFO - Tuning hyperparameters using grid
2024-05-16 08:52:38,385 - julearn - INFO - Hyperparameters:
2024-05-16 08:52:38,385 - julearn - INFO - svm__C: [0.01, 0.1]
2024-05-16 08:52:38,385 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,385 - julearn - INFO - Search Parameters:
2024-05-16 08:52:38,385 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,385 - julearn - INFO - ====================
2024-05-16 08:52:38,385 - julearn - INFO -
2024-05-16 08:52:38,386 - julearn - INFO - = Model Parameters =
2024-05-16 08:52:38,386 - julearn - INFO - Tuning hyperparameters using grid
2024-05-16 08:52:38,386 - julearn - INFO - Hyperparameters:
2024-05-16 08:52:38,386 - julearn - INFO - svm__C: [0.01, 0.1]
2024-05-16 08:52:38,386 - julearn - INFO - svm__gamma: ['scale', 'auto', 0.01, 0.001]
2024-05-16 08:52:38,386 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,386 - julearn - INFO - Search Parameters:
2024-05-16 08:52:38,386 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,386 - julearn - INFO - ====================
2024-05-16 08:52:38,386 - julearn - INFO -
2024-05-16 08:52:38,386 - julearn - INFO - = Model Parameters =
2024-05-16 08:52:38,386 - julearn - INFO - Tuning hyperparameters using grid
2024-05-16 08:52:38,386 - julearn - INFO - Hyperparameters list:
2024-05-16 08:52:38,387 - julearn - INFO - Set 0
2024-05-16 08:52:38,387 - julearn - INFO - svm__C: [0.01, 0.1]
2024-05-16 08:52:38,387 - julearn - INFO - set_column_types: [SetColumnTypes(X_types={})]
2024-05-16 08:52:38,387 - julearn - INFO - svm: [SVC(kernel='linear')]
2024-05-16 08:52:38,387 - julearn - INFO - Set 1
2024-05-16 08:52:38,387 - julearn - INFO - svm__C: [0.01, 0.1]
2024-05-16 08:52:38,387 - julearn - INFO - svm__gamma: ['scale', 'auto', 0.01, 0.001]
2024-05-16 08:52:38,387 - julearn - INFO - set_column_types: [SetColumnTypes(X_types={})]
2024-05-16 08:52:38,387 - julearn - INFO - svm: [SVC()]
2024-05-16 08:52:38,388 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,388 - julearn - INFO - Search Parameters:
2024-05-16 08:52:38,388 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-05-16 08:52:38,388 - julearn - INFO - ====================
2024-05-16 08:52:38,388 - julearn - INFO -
2024-05-16 08:52:38,388 - julearn - INFO - = Data Information =
2024-05-16 08:52:38,388 - julearn - INFO - Problem type: classification
2024-05-16 08:52:38,388 - julearn - INFO - Number of samples: 532
2024-05-16 08:52:38,388 - julearn - INFO - Number of features: 2
2024-05-16 08:52:38,388 - julearn - INFO - ====================
2024-05-16 08:52:38,388 - julearn - INFO -
2024-05-16 08:52:38,388 - julearn - INFO - Number of classes: 2
2024-05-16 08:52:38,388 - julearn - INFO - Target type: object
2024-05-16 08:52:38,389 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-05-16 08:52:38,389 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-05-16 08:52:38,389 - julearn - INFO - Binary classification problem detected.
2024-05-16 08:52:39,425 - julearn - INFO - Fitting final model
0.7087109857168048
It seems that we might have found a better model, but which one is it?
print(estimator.best_params_)
print(estimator.best_estimator_["svm"]._gamma)
{'set_column_types': SetColumnTypes(X_types={}), 'svm': SVC(), 'svm__C': 0.1, 'svm__gamma': 'scale'}
0.5
Total running time of the script: (0 minutes 1.345 seconds)