Note

This page is a reference documentation. It only explains the function signature, and not how to use it. Please refer to the What you really need to know section for the big picture.

julearn.run_cross_validation#

julearn.run_cross_validation(X, y, model, X_types=None, data=None, problem_type=None, preprocess=None, return_estimator=None, return_inspector=False, return_train_score=False, cv=None, groups=None, scoring=None, pos_labels=None, model_params=None, search_params=None, seed=None, n_jobs=None, verbose=0)#

Run cross validation and score.

Parameters:

Xstr, list(str) or numpy.array

The features to use. See https://juaml.github.io/julearn/input.html for details.

ystr or numpy.array

The targets to predict. See https://juaml.github.io/julearn/input.html for details.

modelstr or scikit-learn compatible model.

If string, it will use one of the available models.

X_typesdict[str, list of str]

A dictionary containing keys with column type as a str and the columns of this column type as a list of str.

datapandas.DataFrame | None

DataFrame with the data (optional). See https://juaml.github.io/julearn/input.html for details.

problem_typestr

The kind of problem to model.

Options are:

“classification”: Perform a classification in which the target (y) has categorical classes (default). The parameter pos_labels can be used to convert a target with multiple_classes into binary.
“regression”. Perform a regression. The target (y) has to be ordinal at least.

preprocessstr, TransformerLike or list or PipelineCreator | None

Transformer to apply to the features. If string, use one of the available transformers. If list, each element can be a string or scikit-learn compatible transformer. If None (default), no transformation is applied.

See documentation for details.

return_estimatorstr | None

Return the fitted estimator(s). Options are:

‘final’: Return the estimator fitted on all the data.
‘cv’: Return the all the estimator from each CV split, fitted on the training data.
‘all’: Return all the estimators (final and cv).

return_inspectorbool

Whether to return the inspector object (default is False)

return_train_scorebool

Whether to return the training score with the test scores (default is False).

cvint, str or cross-validation generator | None

Cross-validation splitting strategy to use for model evaluation.

Options are:

None: defaults to 5-fold
int: the number of folds in a (Stratified)KFold
CV Splitter (see scikit-learn documentation on CV)
An iterable yielding (train, test) splits as arrays of indices.

groupsstr or numpy.array | None

The grouping labels in case a Group CV is used. See https://juaml.github.io/julearn/input.html for details.

scoringScorerLike, optional

The scoring metric to use. See https://scikit-learn.org/stable/modules/model_evaluation.html for a comprehensive list of options. If None, use the model’s default scorer.

pos_labelsstr, int, float or list | None

The labels to interpret as positive. If not None, every element from y will be converted to 1 if is equal or in pos_labels and to 0 if not.

model_paramsdict | None

If not None, this dictionary specifies the model parameters to use

The dictionary can define the following keys:

‘STEP__PARAMETER’: A value (or several) to be used as PARAMETER for STEP in the pipeline. Example: ‘svm__probability’: True will set the parameter ‘probability’ of the ‘svm’ model. If more than option is provided for at least one hyperparameter, a search will be performed.

search_paramsdict | None

Additional parameters in case Hyperparameter Tuning is performed, with the following keys:

‘kind’: The kind of search algorithm to use, e.g.:
‘grid’ or ‘random’. Can be any valid julearn searcher name or scikit-learn compatible searcher.
‘cv’: If a searcher is going to be used, the cross-validation
splitting strategy to use. Defaults to same CV as for the model evaluation.
‘scoring’: If a searcher is going to be used, the scoring metric to
evaluate the performance.

See https://juaml.github.io/julearn/hyperparameters.html for details.

seedint | None

If not None, set the random seed before any operation. Useful for reproducibility.

verbose: int

Verbosity level of outer cross-validation. Follows scikit-learn/joblib converntions. 0 means no additional information is printed. Larger number genereally mean more information is printed. Note: verbosity up to 50 will print into standard error, while larger than 50 will print in standrad output.

Returns:

scorespd.DataFrame: The resulting scores (one column for each score specified). Additionally, a ‘fit_time’ column will be added. And, if return_estimator='all' or return_estimator='cv', an ‘estimator’ columns with the corresponding estimators fitted for each CV split.
final_estimatorobject: The final estimator, fitted on all the data (only if return_estimator='all' or return_estimator='final')
inspectorInspector | None: The inspector object (only if return_inspector=True)

Examples using `julearn.run_cross_validation`#

Simple Binary Classification

Grouped CV.

Multiclass Classification.

Stratified K-fold CV for regression analysis

Regression Analysis

Simple Model Comparison

Inspecting the fold-wise predictions

Inspecting Random Forest models

Inspecting SVM models.

Preprocessing with variance threshold, zscore and PCA

Transforming target variable with z-score.

Tuning Multiple Hyperparameters Grids

Stacking Classification

Tuning Hyperparameters

Regression Analysis

Return Confounds in Confound Removal

Confound Removal (model comparison)

Custom Scoring Function for Regression

Applying preprocessing to the target

Connectome-based Predictive Modeling (CBPM)

Cross-validation consistent Confound Removal

Stacking Models

Hyperparameter Tuning

julearn.run_cross_validation#

Examples using julearn.run_cross_validation#

Examples using `julearn.run_cross_validation`#