Note
Go to the end to download the full example code
Tuning Hyperparameters#
This example uses the fmri
dataset, performs simple binary classification
using a Support Vector Machine classifier and analyze the model.
References#
Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of cognitive control in context-dependent decision-making. Cerebral Cortex.
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL
import numpy as np
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.utils import configure_logging
from julearn.pipeline import PipelineCreator
Set the logging level to info to see extra information.
configure_logging(level="INFO")
2024-04-04 14:44:03,901 - julearn - INFO - ===== Lib Versions =====
2024-04-04 14:44:03,901 - julearn - INFO - numpy: 1.26.4
2024-04-04 14:44:03,901 - julearn - INFO - scipy: 1.13.0
2024-04-04 14:44:03,901 - julearn - INFO - sklearn: 1.4.1.post1
2024-04-04 14:44:03,901 - julearn - INFO - pandas: 2.1.4
2024-04-04 14:44:03,901 - julearn - INFO - julearn: 0.3.2.dev24
2024-04-04 14:44:03,901 - julearn - INFO - ========================
Set the random seed to always have the same example.
np.random.seed(42)
Load the dataset.
df_fmri = load_dataset("fmri")
df_fmri.head()
Set the dataframe in the right format.
df_fmri = df_fmri.pivot(
index=["subject", "timepoint", "event"], columns="region", values="signal"
)
df_fmri = df_fmri.reset_index()
df_fmri.head()
Let’s do a first attempt and use a linear SVM with the default parameters.
2024-04-04 14:44:03,910 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:03,910 - julearn - INFO - Step added
2024-04-04 14:44:03,910 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:03,910 - julearn - INFO - Setting hyperparameter kernel = linear
2024-04-04 14:44:03,910 - julearn - INFO - Step added
2024-04-04 14:44:03,910 - julearn - INFO - ==== Input Data ====
2024-04-04 14:44:03,910 - julearn - INFO - Using dataframe as input
2024-04-04 14:44:03,911 - julearn - INFO - Features: ['frontal', 'parietal']
2024-04-04 14:44:03,911 - julearn - INFO - Target: event
2024-04-04 14:44:03,911 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-04-04 14:44:03,911 - julearn - INFO - X_types:{}
2024-04-04 14:44:03,911 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:507: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-04-04 14:44:03,911 - julearn - INFO - ====================
2024-04-04 14:44:03,912 - julearn - INFO -
2024-04-04 14:44:03,912 - julearn - INFO - = Model Parameters =
2024-04-04 14:44:03,912 - julearn - INFO - ====================
2024-04-04 14:44:03,912 - julearn - INFO -
2024-04-04 14:44:03,912 - julearn - INFO - = Data Information =
2024-04-04 14:44:03,912 - julearn - INFO - Problem type: classification
2024-04-04 14:44:03,912 - julearn - INFO - Number of samples: 532
2024-04-04 14:44:03,912 - julearn - INFO - Number of features: 2
2024-04-04 14:44:03,912 - julearn - INFO - ====================
2024-04-04 14:44:03,912 - julearn - INFO -
2024-04-04 14:44:03,913 - julearn - INFO - Number of classes: 2
2024-04-04 14:44:03,913 - julearn - INFO - Target type: object
2024-04-04 14:44:03,913 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-04-04 14:44:03,913 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-04-04 14:44:03,914 - julearn - INFO - Binary classification problem detected.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:73: FutureWarning: `fit_params` is deprecated and will be removed in version 1.6. Pass parameters via `params` instead.
warnings.warn(
0.5939164168576971
The score is not so good. Let’s try to see if there is an optimal
regularization parameter (C) for the linear SVM.
We will use a grid search to find the best C
.
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="linear", C=[0.01, 0.1])
search_params = {
"kind": "grid",
"cv": 2, # to speed up the example
}
scores, estimator = run_cross_validation(
X=X,
y=y,
data=df_fmri,
model=creator,
search_params=search_params,
return_estimator="final",
)
print(scores["test_score"].mean())
2024-04-04 14:44:03,970 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:03,970 - julearn - INFO - Step added
2024-04-04 14:44:03,970 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:03,970 - julearn - INFO - Setting hyperparameter kernel = linear
2024-04-04 14:44:03,970 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-04-04 14:44:03,970 - julearn - INFO - Step added
2024-04-04 14:44:03,970 - julearn - INFO - ==== Input Data ====
2024-04-04 14:44:03,970 - julearn - INFO - Using dataframe as input
2024-04-04 14:44:03,970 - julearn - INFO - Features: ['frontal', 'parietal']
2024-04-04 14:44:03,970 - julearn - INFO - Target: event
2024-04-04 14:44:03,970 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-04-04 14:44:03,970 - julearn - INFO - X_types:{}
2024-04-04 14:44:03,970 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:507: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-04-04 14:44:03,971 - julearn - INFO - ====================
2024-04-04 14:44:03,971 - julearn - INFO -
2024-04-04 14:44:03,972 - julearn - INFO - = Model Parameters =
2024-04-04 14:44:03,972 - julearn - INFO - Tuning hyperparameters using grid
2024-04-04 14:44:03,972 - julearn - INFO - Hyperparameters:
2024-04-04 14:44:03,972 - julearn - INFO - svm__C: [0.01, 0.1]
2024-04-04 14:44:03,972 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:03,972 - julearn - INFO - Search Parameters:
2024-04-04 14:44:03,972 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:03,972 - julearn - INFO - ====================
2024-04-04 14:44:03,972 - julearn - INFO -
2024-04-04 14:44:03,972 - julearn - INFO - = Data Information =
2024-04-04 14:44:03,972 - julearn - INFO - Problem type: classification
2024-04-04 14:44:03,972 - julearn - INFO - Number of samples: 532
2024-04-04 14:44:03,972 - julearn - INFO - Number of features: 2
2024-04-04 14:44:03,973 - julearn - INFO - ====================
2024-04-04 14:44:03,973 - julearn - INFO -
2024-04-04 14:44:03,973 - julearn - INFO - Number of classes: 2
2024-04-04 14:44:03,973 - julearn - INFO - Target type: object
2024-04-04 14:44:03,973 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-04-04 14:44:03,974 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-04-04 14:44:03,974 - julearn - INFO - Binary classification problem detected.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:73: FutureWarning: `fit_params` is deprecated and will be removed in version 1.6. Pass parameters via `params` instead.
warnings.warn(
0.588308940222183
This did not change much, lets explore other kernels too.
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel=["linear", "rbf", "poly"], C=[0.01, 0.1])
scores, estimator = run_cross_validation(
X=X,
y=y,
data=df_fmri,
model=creator,
search_params=search_params,
return_estimator="final",
)
print(scores["test_score"].mean())
2024-04-04 14:44:04,255 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:04,255 - julearn - INFO - Step added
2024-04-04 14:44:04,256 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:04,256 - julearn - INFO - Tuning hyperparameter kernel = ['linear', 'rbf', 'poly']
2024-04-04 14:44:04,256 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-04-04 14:44:04,256 - julearn - INFO - Step added
2024-04-04 14:44:04,256 - julearn - INFO - ==== Input Data ====
2024-04-04 14:44:04,256 - julearn - INFO - Using dataframe as input
2024-04-04 14:44:04,256 - julearn - INFO - Features: ['frontal', 'parietal']
2024-04-04 14:44:04,256 - julearn - INFO - Target: event
2024-04-04 14:44:04,256 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-04-04 14:44:04,256 - julearn - INFO - X_types:{}
2024-04-04 14:44:04,256 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:507: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-04-04 14:44:04,257 - julearn - INFO - ====================
2024-04-04 14:44:04,257 - julearn - INFO -
2024-04-04 14:44:04,257 - julearn - INFO - = Model Parameters =
2024-04-04 14:44:04,257 - julearn - INFO - Tuning hyperparameters using grid
2024-04-04 14:44:04,257 - julearn - INFO - Hyperparameters:
2024-04-04 14:44:04,257 - julearn - INFO - svm__kernel: ['linear', 'rbf', 'poly']
2024-04-04 14:44:04,257 - julearn - INFO - svm__C: [0.01, 0.1]
2024-04-04 14:44:04,258 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:04,258 - julearn - INFO - Search Parameters:
2024-04-04 14:44:04,258 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:04,258 - julearn - INFO - ====================
2024-04-04 14:44:04,258 - julearn - INFO -
2024-04-04 14:44:04,258 - julearn - INFO - = Data Information =
2024-04-04 14:44:04,258 - julearn - INFO - Problem type: classification
2024-04-04 14:44:04,258 - julearn - INFO - Number of samples: 532
2024-04-04 14:44:04,258 - julearn - INFO - Number of features: 2
2024-04-04 14:44:04,258 - julearn - INFO - ====================
2024-04-04 14:44:04,258 - julearn - INFO -
2024-04-04 14:44:04,258 - julearn - INFO - Number of classes: 2
2024-04-04 14:44:04,258 - julearn - INFO - Target type: object
2024-04-04 14:44:04,259 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-04-04 14:44:04,259 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-04-04 14:44:04,259 - julearn - INFO - Binary classification problem detected.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:73: FutureWarning: `fit_params` is deprecated and will be removed in version 1.6. Pass parameters via `params` instead.
warnings.warn(
0.7087109857168048
It seems that we might have found a better model, but which one is it?
print(estimator.best_params_)
{'svm__C': 0.1, 'svm__kernel': 'rbf'}
Now that we know that a RBF kernel is better, lest test different gamma parameters.
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="rbf", C=[0.01, 0.1], gamma=[1e-2, 1e-3])
scores, estimator = run_cross_validation(
X=X,
y=y,
data=df_fmri,
model=creator,
search_params=search_params,
return_estimator="final",
)
print(scores["test_score"].mean())
print(estimator.best_params_)
2024-04-04 14:44:05,004 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:05,004 - julearn - INFO - Step added
2024-04-04 14:44:05,004 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:05,004 - julearn - INFO - Setting hyperparameter kernel = rbf
2024-04-04 14:44:05,004 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-04-04 14:44:05,004 - julearn - INFO - Tuning hyperparameter gamma = [0.01, 0.001]
2024-04-04 14:44:05,004 - julearn - INFO - Step added
2024-04-04 14:44:05,004 - julearn - INFO - ==== Input Data ====
2024-04-04 14:44:05,004 - julearn - INFO - Using dataframe as input
2024-04-04 14:44:05,004 - julearn - INFO - Features: ['frontal', 'parietal']
2024-04-04 14:44:05,004 - julearn - INFO - Target: event
2024-04-04 14:44:05,004 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-04-04 14:44:05,005 - julearn - INFO - X_types:{}
2024-04-04 14:44:05,005 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:507: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-04-04 14:44:05,005 - julearn - INFO - ====================
2024-04-04 14:44:05,005 - julearn - INFO -
2024-04-04 14:44:05,006 - julearn - INFO - = Model Parameters =
2024-04-04 14:44:05,006 - julearn - INFO - Tuning hyperparameters using grid
2024-04-04 14:44:05,006 - julearn - INFO - Hyperparameters:
2024-04-04 14:44:05,006 - julearn - INFO - svm__C: [0.01, 0.1]
2024-04-04 14:44:05,006 - julearn - INFO - svm__gamma: [0.01, 0.001]
2024-04-04 14:44:05,006 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:05,006 - julearn - INFO - Search Parameters:
2024-04-04 14:44:05,006 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:05,006 - julearn - INFO - ====================
2024-04-04 14:44:05,007 - julearn - INFO -
2024-04-04 14:44:05,007 - julearn - INFO - = Data Information =
2024-04-04 14:44:05,007 - julearn - INFO - Problem type: classification
2024-04-04 14:44:05,007 - julearn - INFO - Number of samples: 532
2024-04-04 14:44:05,007 - julearn - INFO - Number of features: 2
2024-04-04 14:44:05,007 - julearn - INFO - ====================
2024-04-04 14:44:05,007 - julearn - INFO -
2024-04-04 14:44:05,007 - julearn - INFO - Number of classes: 2
2024-04-04 14:44:05,007 - julearn - INFO - Target type: object
2024-04-04 14:44:05,008 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-04-04 14:44:05,008 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-04-04 14:44:05,008 - julearn - INFO - Binary classification problem detected.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:73: FutureWarning: `fit_params` is deprecated and will be removed in version 1.6. Pass parameters via `params` instead.
warnings.warn(
0.5188855581026275
{'svm__C': 0.01, 'svm__gamma': 0.001}
It seems that without tuning the gamma parameter we had a better accuracy. Let’s add the default value and see what happens.
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm", kernel="rbf", C=[0.01, 0.1], gamma=[1e-2, 1e-3, "scale"])
X = ["frontal", "parietal"]
y = "event"
search_params = {"cv": 2}
scores, estimator = run_cross_validation(
X=X,
y=y,
data=df_fmri,
model=creator,
return_estimator="final",
search_params=search_params,
)
print(scores["test_score"].mean())
print(estimator.best_params_)
2024-04-04 14:44:05,564 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:05,564 - julearn - INFO - Step added
2024-04-04 14:44:05,564 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2024-04-04 14:44:05,564 - julearn - INFO - Setting hyperparameter kernel = rbf
2024-04-04 14:44:05,564 - julearn - INFO - Tuning hyperparameter C = [0.01, 0.1]
2024-04-04 14:44:05,564 - julearn - INFO - Tuning hyperparameter gamma = [0.01, 0.001, 'scale']
2024-04-04 14:44:05,565 - julearn - INFO - Step added
2024-04-04 14:44:05,565 - julearn - INFO - ==== Input Data ====
2024-04-04 14:44:05,565 - julearn - INFO - Using dataframe as input
2024-04-04 14:44:05,565 - julearn - INFO - Features: ['frontal', 'parietal']
2024-04-04 14:44:05,565 - julearn - INFO - Target: event
2024-04-04 14:44:05,565 - julearn - INFO - Expanded features: ['frontal', 'parietal']
2024-04-04 14:44:05,565 - julearn - INFO - X_types:{}
2024-04-04 14:44:05,565 - julearn - WARNING - The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:507: RuntimeWarning: The following columns are not defined in X_types: ['frontal', 'parietal']. They will be treated as continuous.
warn_with_log(
2024-04-04 14:44:05,566 - julearn - INFO - ====================
2024-04-04 14:44:05,566 - julearn - INFO -
2024-04-04 14:44:05,566 - julearn - INFO - = Model Parameters =
2024-04-04 14:44:05,566 - julearn - INFO - Tuning hyperparameters using grid
2024-04-04 14:44:05,566 - julearn - INFO - Hyperparameters:
2024-04-04 14:44:05,566 - julearn - INFO - svm__C: [0.01, 0.1]
2024-04-04 14:44:05,566 - julearn - INFO - svm__gamma: [0.01, 0.001, 'scale']
2024-04-04 14:44:05,567 - julearn - INFO - Using inner CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:05,567 - julearn - INFO - Search Parameters:
2024-04-04 14:44:05,567 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2024-04-04 14:44:05,567 - julearn - INFO - ====================
2024-04-04 14:44:05,567 - julearn - INFO -
2024-04-04 14:44:05,567 - julearn - INFO - = Data Information =
2024-04-04 14:44:05,567 - julearn - INFO - Problem type: classification
2024-04-04 14:44:05,567 - julearn - INFO - Number of samples: 532
2024-04-04 14:44:05,567 - julearn - INFO - Number of features: 2
2024-04-04 14:44:05,567 - julearn - INFO - ====================
2024-04-04 14:44:05,567 - julearn - INFO -
2024-04-04 14:44:05,567 - julearn - INFO - Number of classes: 2
2024-04-04 14:44:05,567 - julearn - INFO - Target type: object
2024-04-04 14:44:05,568 - julearn - INFO - Class distributions: event
cue 266
stim 266
Name: count, dtype: int64
2024-04-04 14:44:05,568 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-04-04 14:44:05,568 - julearn - INFO - Binary classification problem detected.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:73: FutureWarning: `fit_params` is deprecated and will be removed in version 1.6. Pass parameters via `params` instead.
warnings.warn(
0.7087109857168048
{'svm__C': 0.1, 'svm__gamma': 'scale'}
print(estimator.best_estimator_["svm"]._gamma)
0.5
Total running time of the script: (0 minutes 2.457 seconds)