Parallelize Julearn

In this example we will parallelize outer cross-validation and/or inner cross-validation for hyperparameter search.

# Authors: Sami Hamdan <s.hamdan@fz-juelich.de>
#
# License: AGPL
from seaborn import load_dataset
from julearn import run_cross_validation

prepare some simple standard input

df_iris = load_dataset("iris")
df_iris = df_iris[df_iris['species'].isin(['versicolor', 'virginica'])]

X = ['sepal_length', 'sepal_width', 'petal_length']
y = 'species'

run without any parallelization

scores = run_cross_validation(
    X=X, y=y, data=df_iris, model='svm',
    model_params=dict(svm__C=[1, 2, 3]))

To add parallelization to the outer cross-validation we will add the n_jobs argument to run_cross_validation. We can use verbose > 0 to get more information about the parallelization done. Here, I will set the parallel jobs to 2.

scores = run_cross_validation(
    X=X, y=y, data=df_iris, model='svm',
    model_params=dict(svm__C=[1, 2, 3]),
    n_jobs=2, verbose=3
)

We can also parallelize over the hyperparameter search/inner cv. This will work by using the n_jobs argument of the searcher itself, e.g. by default sklearn.model_selection.GridSearchCV. To adjust the parameters of the search we have to use the search_params inside of the model_params like this:

model_params = dict(
    svm__C=[1, 2, 3],
    search_params=dict(n_jobs=2, verbose=3)
)

scores = run_cross_validation(
    X=X, y=y, data=df_iris, model='svm',
    model_params=model_params
)
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits

Depending on your resources you can use n_jobs for outer cv, inner cv or even as a model_parameter for some models like rf. Additionally, you can also use the scikitlearns parallel_backend for parallelization.

Total running time of the script: ( 0 minutes 9.662 seconds)

Gallery generated by Sphinx-Gallery