Note
Go to the end to download the full example code.
Inspecting the fold-wise predictions¶
This example uses the iris dataset and performs a simple binary
classification using a Support Vector Machine classifier.
We later inspect the predictions of the model for each fold.
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL
from seaborn import load_dataset
from sklearn.model_selection import RepeatedStratifiedKFold, ShuffleSplit
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging
Set the logging level to info to see extra information.
configure_logging(level="INFO")
2026-01-16 10:54:00,676 - julearn - INFO - ===== Lib Versions =====
2026-01-16 10:54:00,676 - julearn - INFO - numpy: 1.26.4
2026-01-16 10:54:00,676 - julearn - INFO - scipy: 1.17.0
2026-01-16 10:54:00,677 - julearn - INFO - sklearn: 1.7.2
2026-01-16 10:54:00,677 - julearn - INFO - pandas: 2.3.3
2026-01-16 10:54:00,677 - julearn - INFO - julearn: 0.3.5.dev123
2026-01-16 10:54:00,677 - julearn - INFO - ========================
df_iris = load_dataset("iris")
The dataset has three kind of species. We will keep two to perform a binary classification.
As features, we will use the sepal length, width and petal length. We will try to predict the species.
X = ["sepal_length", "sepal_width", "petal_length"]
y = "species"
X_types = {"continuous": X}
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
creator.add("svm")
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=4, random_state=200)
scores, model, inspector = run_cross_validation(
X=X,
y=y,
data=df_iris,
model=creator,
return_inspector=True,
cv=cv,
)
print(scores)
2026-01-16 10:54:00,679 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-01-16 10:54:00,680 - julearn - INFO - Step added
2026-01-16 10:54:00,680 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
2026-01-16 10:54:00,680 - julearn - INFO - Step added
2026-01-16 10:54:00,680 - julearn - INFO - ==== Input Data ====
2026-01-16 10:54:00,680 - julearn - INFO - Using dataframe as input
2026-01-16 10:54:00,680 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length']
2026-01-16 10:54:00,680 - julearn - INFO - Target: species
2026-01-16 10:54:00,681 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length']
2026-01-16 10:54:00,681 - julearn - INFO - X_types:{}
2026-01-16 10:54:00,681 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
/home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
warn_with_log(
2026-01-16 10:54:00,681 - julearn - INFO - ====================
2026-01-16 10:54:00,682 - julearn - INFO -
2026-01-16 10:54:00,682 - julearn - INFO - = Model Parameters =
2026-01-16 10:54:00,682 - julearn - INFO - ====================
2026-01-16 10:54:00,682 - julearn - INFO -
2026-01-16 10:54:00,682 - julearn - INFO - = Data Information =
2026-01-16 10:54:00,683 - julearn - INFO - Problem type: classification
2026-01-16 10:54:00,683 - julearn - INFO - Number of samples: 100
2026-01-16 10:54:00,683 - julearn - INFO - Number of features: 3
2026-01-16 10:54:00,683 - julearn - INFO - ====================
2026-01-16 10:54:00,683 - julearn - INFO -
2026-01-16 10:54:00,683 - julearn - INFO - Number of classes: 2
2026-01-16 10:54:00,683 - julearn - INFO - Target type: object
2026-01-16 10:54:00,684 - julearn - INFO - Class distributions: species
versicolor 50
virginica 50
Name: count, dtype: int64
2026-01-16 10:54:00,684 - julearn - INFO - Using outer CV scheme RepeatedStratifiedKFold(n_repeats=4, n_splits=5, random_state=200) (incl. final model)
2026-01-16 10:54:00,684 - julearn - INFO - Binary classification problem detected.
fit_time score_time ... fold cv_mdsum
0 0.004657 0.003870 ... 0 42489ff0163b2f12752440a6b7ef74c7
1 0.004688 0.004009 ... 1 42489ff0163b2f12752440a6b7ef74c7
2 0.004682 0.003941 ... 2 42489ff0163b2f12752440a6b7ef74c7
3 0.004733 0.003969 ... 3 42489ff0163b2f12752440a6b7ef74c7
4 0.004649 0.003953 ... 4 42489ff0163b2f12752440a6b7ef74c7
5 0.004696 0.003990 ... 0 42489ff0163b2f12752440a6b7ef74c7
6 0.004677 0.004021 ... 1 42489ff0163b2f12752440a6b7ef74c7
7 0.004701 0.003932 ... 2 42489ff0163b2f12752440a6b7ef74c7
8 0.004676 0.003990 ... 3 42489ff0163b2f12752440a6b7ef74c7
9 0.004700 0.003932 ... 4 42489ff0163b2f12752440a6b7ef74c7
10 0.004657 0.003940 ... 0 42489ff0163b2f12752440a6b7ef74c7
11 0.004702 0.003920 ... 1 42489ff0163b2f12752440a6b7ef74c7
12 0.004696 0.003996 ... 2 42489ff0163b2f12752440a6b7ef74c7
13 0.004629 0.003918 ... 3 42489ff0163b2f12752440a6b7ef74c7
14 0.004639 0.003937 ... 4 42489ff0163b2f12752440a6b7ef74c7
15 0.004685 0.003982 ... 0 42489ff0163b2f12752440a6b7ef74c7
16 0.004632 0.003963 ... 1 42489ff0163b2f12752440a6b7ef74c7
17 0.004720 0.004026 ... 2 42489ff0163b2f12752440a6b7ef74c7
18 0.004678 0.003946 ... 3 42489ff0163b2f12752440a6b7ef74c7
19 0.004702 0.004186 ... 4 42489ff0163b2f12752440a6b7ef74c7
[20 rows x 9 columns]
We can now inspect the predictions of the model for each fold.
cv_predictions = inspector.folds.predict()
print(cv_predictions)
index target repeat0_p0 repeat1_p0 repeat2_p0 repeat3_p0
0 50 versicolor versicolor versicolor versicolor versicolor
1 51 versicolor versicolor versicolor versicolor versicolor
2 52 versicolor versicolor versicolor versicolor versicolor
3 53 versicolor versicolor versicolor versicolor versicolor
4 54 versicolor versicolor versicolor versicolor versicolor
.. ... ... ... ... ... ...
95 145 virginica virginica virginica virginica virginica
96 146 virginica virginica virginica virginica virginica
97 147 virginica virginica virginica virginica virginica
98 148 virginica virginica virginica virginica virginica
99 149 virginica virginica virginica virginica virginica
[100 rows x 6 columns]
inspector.folds[0].model
<julearn.inspect._pipeline.PipelineInspector object at 0x7f2d825cae40>
Total running time of the script: (0 minutes 0.280 seconds)