Note
Click here to download the full example code
Tuning Hyperparameters
This example uses the ‘fmri’ dataset, performs simple binary classification using a Support Vector Machine classifier and analyse the model.
References
Waskom, M.L., Frank, M.C., Wagner, A.D. (2016). Adaptive engagement of cognitive control in context-dependent decision-making. Cerebral Cortex.
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
#
# License: AGPL
import numpy as np
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.utils import configure_logging
Out:
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'rocket' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'rocket_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'mako' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'mako_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'icefire' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'icefire_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'vlag' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'vlag_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'flare' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'flare_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'crest' which already exists.
mpl_cm.register_cmap(_name, _cmap)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'crest_r' which already exists.
mpl_cm.register_cmap(_name + "_r", _cmap_r)
Set the logging level to info to see extra information
configure_logging(level='INFO')
Out:
2022-07-21 09:55:05,702 - julearn - INFO - ===== Lib Versions =====
2022-07-21 09:55:05,702 - julearn - INFO - numpy: 1.23.1
2022-07-21 09:55:05,702 - julearn - INFO - scipy: 1.8.1
2022-07-21 09:55:05,702 - julearn - INFO - sklearn: 1.0.2
2022-07-21 09:55:05,702 - julearn - INFO - pandas: 1.4.3
2022-07-21 09:55:05,702 - julearn - INFO - julearn: 0.2.5
2022-07-21 09:55:05,702 - julearn - INFO - ========================
Set the random seed to always have the same example
np.random.seed(42)
Load the dataset
df_fmri = load_dataset('fmri')
print(df_fmri.head())
Out:
subject timepoint event region signal
0 s13 18 stim parietal -0.017552
1 s5 14 stim parietal -0.080883
2 s12 18 stim parietal -0.081033
3 s11 18 stim parietal -0.046134
4 s10 18 stim parietal -0.037970
Set the dataframe in the right format
df_fmri = df_fmri.pivot(
index=['subject', 'timepoint', 'event'],
columns='region',
values='signal')
df_fmri = df_fmri.reset_index()
print(df_fmri.head())
Out:
region subject timepoint event frontal parietal
0 s0 0 cue 0.007766 -0.006899
1 s0 0 stim -0.021452 -0.039327
2 s0 1 cue 0.016440 0.000300
3 s0 1 stim -0.021054 -0.035735
4 s0 2 cue 0.024296 0.033220
Lets do a first attempt and use a linear SVM with the default parameters.
model_params = {'svm__kernel': 'linear'}
X = ['frontal', 'parietal']
y = 'event'
scores = run_cross_validation(
X=X, y=y, data=df_fmri, model='svm', preprocess_X='zscore',
model_params=model_params)
print(scores['test_score'].mean())
Out:
2022-07-21 09:55:05,719 - julearn - INFO - Using default CV
2022-07-21 09:55:05,719 - julearn - INFO - ==== Input Data ====
2022-07-21 09:55:05,719 - julearn - INFO - Using dataframe as input
2022-07-21 09:55:05,719 - julearn - INFO - Features: ['frontal', 'parietal']
2022-07-21 09:55:05,719 - julearn - INFO - Target: event
2022-07-21 09:55:05,719 - julearn - INFO - Expanded X: ['frontal', 'parietal']
2022-07-21 09:55:05,719 - julearn - INFO - Expanded Confounds: []
2022-07-21 09:55:05,720 - julearn - INFO - ====================
2022-07-21 09:55:05,720 - julearn - INFO -
2022-07-21 09:55:05,720 - julearn - INFO - ====== Model ======
2022-07-21 09:55:05,721 - julearn - INFO - Obtaining model by name: svm
2022-07-21 09:55:05,721 - julearn - INFO - ===================
2022-07-21 09:55:05,721 - julearn - INFO -
2022-07-21 09:55:05,721 - julearn - INFO - = Model Parameters =
2022-07-21 09:55:05,721 - julearn - INFO - Setting hyperparameter svm__kernel = linear
2022-07-21 09:55:05,722 - julearn - INFO - ====================
2022-07-21 09:55:05,722 - julearn - INFO -
2022-07-21 09:55:05,722 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds
0.5765508728619291
The score is not so good. Lets try to see if there is an optimal regularization parameter (C) for the linear SVM.
model_params = {
'svm__kernel': 'linear',
'svm__C': [0.01, 0.1],
'cv': 2} # CV=2 too speed up the example
X = ['frontal', 'parietal']
y = 'event'
scores, estimator = run_cross_validation(
X=X, y=y, data=df_fmri, model='svm', preprocess_X='zscore',
model_params=model_params, return_estimator='final')
print(scores['test_score'].mean())
Out:
2022-07-21 09:55:06,204 - julearn - INFO - Using default CV
2022-07-21 09:55:06,204 - julearn - INFO - ==== Input Data ====
2022-07-21 09:55:06,204 - julearn - INFO - Using dataframe as input
2022-07-21 09:55:06,204 - julearn - INFO - Features: ['frontal', 'parietal']
2022-07-21 09:55:06,204 - julearn - INFO - Target: event
2022-07-21 09:55:06,204 - julearn - INFO - Expanded X: ['frontal', 'parietal']
2022-07-21 09:55:06,204 - julearn - INFO - Expanded Confounds: []
2022-07-21 09:55:06,205 - julearn - INFO - ====================
2022-07-21 09:55:06,205 - julearn - INFO -
2022-07-21 09:55:06,205 - julearn - INFO - ====== Model ======
2022-07-21 09:55:06,205 - julearn - INFO - Obtaining model by name: svm
2022-07-21 09:55:06,205 - julearn - INFO - ===================
2022-07-21 09:55:06,205 - julearn - INFO -
2022-07-21 09:55:06,205 - julearn - INFO - = Model Parameters =
2022-07-21 09:55:06,205 - julearn - INFO - Setting hyperparameter svm__kernel = linear
2022-07-21 09:55:06,206 - julearn - WARNING - `cv` should not be directly provided in the`model_params` anymore. This functionality willbe removed in the next version of julearn.Please use `cv` inside of `search_params` instead
2022-07-21 09:55:06,206 - julearn - INFO - Tunning hyperparameters using grid
2022-07-21 09:55:06,206 - julearn - INFO - Hyperparameters:
2022-07-21 09:55:06,206 - julearn - INFO - svm__C: [0.01, 0.1]
2022-07-21 09:55:06,207 - julearn - INFO - Using scikit-learn CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:06,207 - julearn - INFO - Search Parameters:
2022-07-21 09:55:06,207 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:06,207 - julearn - INFO - scoring: None
2022-07-21 09:55:06,207 - julearn - INFO - ====================
2022-07-21 09:55:06,207 - julearn - INFO -
2022-07-21 09:55:06,207 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds
0.575591606418621
This did not change much, lets explore other kernels too.
model_params = {
'svm__kernel': ['linear', 'rbf', 'poly'],
'svm__C': [0.01, 0.1],
'cv': 2} # CV=2 too speed up the example
X = ['frontal', 'parietal']
y = 'event'
scores, estimator = run_cross_validation(
X=X, y=y, data=df_fmri, model='svm', preprocess_X='zscore',
model_params=model_params, return_estimator='final')
print(scores['test_score'].mean())
Out:
2022-07-21 09:55:08,503 - julearn - INFO - Using default CV
2022-07-21 09:55:08,503 - julearn - INFO - ==== Input Data ====
2022-07-21 09:55:08,503 - julearn - INFO - Using dataframe as input
2022-07-21 09:55:08,503 - julearn - INFO - Features: ['frontal', 'parietal']
2022-07-21 09:55:08,503 - julearn - INFO - Target: event
2022-07-21 09:55:08,503 - julearn - INFO - Expanded X: ['frontal', 'parietal']
2022-07-21 09:55:08,503 - julearn - INFO - Expanded Confounds: []
2022-07-21 09:55:08,504 - julearn - INFO - ====================
2022-07-21 09:55:08,504 - julearn - INFO -
2022-07-21 09:55:08,504 - julearn - INFO - ====== Model ======
2022-07-21 09:55:08,504 - julearn - INFO - Obtaining model by name: svm
2022-07-21 09:55:08,504 - julearn - INFO - ===================
2022-07-21 09:55:08,504 - julearn - INFO -
2022-07-21 09:55:08,505 - julearn - INFO - = Model Parameters =
2022-07-21 09:55:08,505 - julearn - WARNING - `cv` should not be directly provided in the`model_params` anymore. This functionality willbe removed in the next version of julearn.Please use `cv` inside of `search_params` instead
2022-07-21 09:55:08,505 - julearn - INFO - Tunning hyperparameters using grid
2022-07-21 09:55:08,505 - julearn - INFO - Hyperparameters:
2022-07-21 09:55:08,505 - julearn - INFO - svm__kernel: ['linear', 'rbf', 'poly']
2022-07-21 09:55:08,505 - julearn - INFO - svm__C: [0.01, 0.1]
2022-07-21 09:55:08,505 - julearn - INFO - Using scikit-learn CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:08,505 - julearn - INFO - Search Parameters:
2022-07-21 09:55:08,505 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:08,505 - julearn - INFO - scoring: None
2022-07-21 09:55:08,505 - julearn - INFO - ====================
2022-07-21 09:55:08,505 - julearn - INFO -
2022-07-21 09:55:08,505 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds
0.7116487391994357
It seems that we might have found a better model, but which one is it?
print(estimator.best_params_)
Out:
{'svm__C': 0.1, 'svm__kernel': 'rbf'}
Now that we know that a RBF kernel is better, lest test different gamma parameters.
model_params = {
'svm__kernel': 'rbf',
'svm__C': [0.01, 0.1],
'svm__gamma': [1e-2, 1e-3],
'cv': 2} # CV=2 too speed up the example
X = ['frontal', 'parietal']
y = 'event'
scores, estimator = run_cross_validation(
X=X, y=y, data=df_fmri, model='svm', preprocess_X='zscore',
model_params=model_params, return_estimator='final')
print(scores['test_score'].mean())
print(estimator.best_params_)
Out:
2022-07-21 09:55:14,417 - julearn - INFO - Using default CV
2022-07-21 09:55:14,417 - julearn - INFO - ==== Input Data ====
2022-07-21 09:55:14,417 - julearn - INFO - Using dataframe as input
2022-07-21 09:55:14,417 - julearn - INFO - Features: ['frontal', 'parietal']
2022-07-21 09:55:14,417 - julearn - INFO - Target: event
2022-07-21 09:55:14,417 - julearn - INFO - Expanded X: ['frontal', 'parietal']
2022-07-21 09:55:14,417 - julearn - INFO - Expanded Confounds: []
2022-07-21 09:55:14,418 - julearn - INFO - ====================
2022-07-21 09:55:14,418 - julearn - INFO -
2022-07-21 09:55:14,418 - julearn - INFO - ====== Model ======
2022-07-21 09:55:14,418 - julearn - INFO - Obtaining model by name: svm
2022-07-21 09:55:14,418 - julearn - INFO - ===================
2022-07-21 09:55:14,418 - julearn - INFO -
2022-07-21 09:55:14,418 - julearn - INFO - = Model Parameters =
2022-07-21 09:55:14,418 - julearn - INFO - Setting hyperparameter svm__kernel = rbf
2022-07-21 09:55:14,419 - julearn - WARNING - `cv` should not be directly provided in the`model_params` anymore. This functionality willbe removed in the next version of julearn.Please use `cv` inside of `search_params` instead
2022-07-21 09:55:14,419 - julearn - INFO - Tunning hyperparameters using grid
2022-07-21 09:55:14,419 - julearn - INFO - Hyperparameters:
2022-07-21 09:55:14,419 - julearn - INFO - svm__C: [0.01, 0.1]
2022-07-21 09:55:14,419 - julearn - INFO - svm__gamma: [0.01, 0.001]
2022-07-21 09:55:14,419 - julearn - INFO - Using scikit-learn CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:14,420 - julearn - INFO - Search Parameters:
2022-07-21 09:55:14,420 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:14,420 - julearn - INFO - scoring: None
2022-07-21 09:55:14,420 - julearn - INFO - ====================
2022-07-21 09:55:14,420 - julearn - INFO -
2022-07-21 09:55:14,420 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds
0.47479104214424267
{'svm__C': 0.01, 'svm__gamma': 0.001}
It seems that without tuning the gamma parameter we had a better accuracy. Let’s add the default value and see what happens.
model_params = {
'svm__kernel': 'rbf',
'svm__C': [0.01, 0.1],
'svm__gamma': [1e-2, 1e-3, 'scale'],
'cv': 2} # CV=2 too speed up the example
X = ['frontal', 'parietal']
y = 'event'
scores, estimator = run_cross_validation(
X=X, y=y, data=df_fmri, model='svm', preprocess_X='zscore',
model_params=model_params, return_estimator='final')
print(scores['test_score'].mean())
print(estimator.best_params_)
Out:
2022-07-21 09:55:18,750 - julearn - INFO - Using default CV
2022-07-21 09:55:18,750 - julearn - INFO - ==== Input Data ====
2022-07-21 09:55:18,750 - julearn - INFO - Using dataframe as input
2022-07-21 09:55:18,750 - julearn - INFO - Features: ['frontal', 'parietal']
2022-07-21 09:55:18,750 - julearn - INFO - Target: event
2022-07-21 09:55:18,750 - julearn - INFO - Expanded X: ['frontal', 'parietal']
2022-07-21 09:55:18,750 - julearn - INFO - Expanded Confounds: []
2022-07-21 09:55:18,751 - julearn - INFO - ====================
2022-07-21 09:55:18,751 - julearn - INFO -
2022-07-21 09:55:18,751 - julearn - INFO - ====== Model ======
2022-07-21 09:55:18,751 - julearn - INFO - Obtaining model by name: svm
2022-07-21 09:55:18,751 - julearn - INFO - ===================
2022-07-21 09:55:18,751 - julearn - INFO -
2022-07-21 09:55:18,751 - julearn - INFO - = Model Parameters =
2022-07-21 09:55:18,751 - julearn - INFO - Setting hyperparameter svm__kernel = rbf
2022-07-21 09:55:18,752 - julearn - WARNING - `cv` should not be directly provided in the`model_params` anymore. This functionality willbe removed in the next version of julearn.Please use `cv` inside of `search_params` instead
2022-07-21 09:55:18,752 - julearn - INFO - Tunning hyperparameters using grid
2022-07-21 09:55:18,752 - julearn - INFO - Hyperparameters:
2022-07-21 09:55:18,752 - julearn - INFO - svm__C: [0.01, 0.1]
2022-07-21 09:55:18,752 - julearn - INFO - svm__gamma: [0.01, 0.001, 'scale']
2022-07-21 09:55:18,753 - julearn - INFO - Using scikit-learn CV scheme KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:18,753 - julearn - INFO - Search Parameters:
2022-07-21 09:55:18,753 - julearn - INFO - cv: KFold(n_splits=2, random_state=None, shuffle=False)
2022-07-21 09:55:18,753 - julearn - INFO - scoring: None
2022-07-21 09:55:18,753 - julearn - INFO - ====================
2022-07-21 09:55:18,753 - julearn - INFO -
2022-07-21 09:55:18,753 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds
0.7074977958032092
{'svm__C': 0.1, 'svm__gamma': 'scale'}
So what was the best gamma
in the end?
print(estimator.best_estimator_['svm']._gamma)
Out:
0.5
Total running time of the script: ( 0 minutes 19.205 seconds)