Note
Click here to download the full example code
Parallelize Julearn
In this example we will parallelize outer cross-validation and/or inner cross-validation for hyperparameter search.
# Authors: Sami Hamdan <s.hamdan@fz-juelich.de>
#
# License: AGPL
from seaborn import load_dataset
from julearn import run_cross_validation
prepare some simple standard input
run without any parallelization
scores = run_cross_validation(
X=X, y=y, data=df_iris, model='svm',
model_params=dict(svm__C=[1, 2, 3]))
To add parallelization to the outer cross-validation we will add the n_jobs argument to run_cross_validation. We can use verbose > 0 to get more information about the parallelization done. Here, I will set the parallel jobs to 2.
scores = run_cross_validation(
X=X, y=y, data=df_iris, model='svm',
model_params=dict(svm__C=[1, 2, 3]),
n_jobs=2, verbose=3
)
We can also parallelize over the hyperparameter search/inner cv. This will work by using the n_jobs argument of the searcher itself, e.g. by default sklearn.model_selection.GridSearchCV. To adjust the parameters of the search we have to use the search_params inside of the model_params like this:
model_params = dict(
svm__C=[1, 2, 3],
search_params=dict(n_jobs=2, verbose=3)
)
scores = run_cross_validation(
X=X, y=y, data=df_iris, model='svm',
model_params=model_params
)
Out:
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Depending on your resources you can use n_jobs for outer cv, inner cv or even as a model_parameter for some models like rf. Additionally, you can also use the scikitlearns parallel_backend for parallelization.
Total running time of the script: ( 0 minutes 8.829 seconds)