6.1. Applying preprocessing to the target¶

What we covered so far is how to apply preprocessing to the features and train a model in a cv-conistent manner by building a pipeline. However, sometimes one wants to apply preprocessing to the target. For example, when having a regression-task (continuous target variable), one might want to predict the z-scored target. This can be achieved by using a TargetPipelineCreator as a step in the general pipeline.

Let’s start by loading the data and importing the required modules:

import pandas as pd
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator, TargetPipelineCreator
from sklearn.datasets import load_diabetes

Load the diabetes dataset from scikit-learn as a pandas.DataFrame

features, target = load_diabetes(return_X_y=True, as_frame=True)

print("Features: \n", features.head())
print("Target: \n", target.describe())

data_diabetes = pd.concat([features, target], axis=1)

X = ["age", "sex", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"]
y = "target"

X_types = {
    "continuous": ["age", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"],
    "categorical": ["sex"],
}

Features:
         age       sex       bmi        bp        s1        s2        s3        s4        s5        s6
0  0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401 -0.002592  0.019907 -0.017646
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412 -0.039493 -0.068332 -0.092204
2  0.085299  0.050680  0.044451 -0.005670 -0.045599 -0.034194 -0.032356 -0.002592  0.002861 -0.025930
3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038  0.034309  0.022688 -0.009362
4  0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142 -0.002592 -0.031988 -0.046641
Target:
 count    442.000000
mean     152.133484
std       77.093005
min       25.000000
25%       87.000000
50%      140.500000
75%      211.500000
max      346.000000
Name: target, dtype: float64

We first create a TargetPipelineCreator:

target_creator = TargetPipelineCreator()
target_creator.add("zscore")

print(target_creator)

TargetPipelineCreator:
  Step 0: zscore
    estimator:     StandardScaler()

Next, we create the general pipeline using a PipelineCreator. We pass the target_creator as a step of the pipeline and specify that it should only be applied to the target, which makes it clear for julearn to only apply it to y:

creator = PipelineCreator(
    problem_type="regression", apply_to=["categorical", "continuous"]
)
creator.add(target_creator, apply_to="target")
creator.add("svm")
print(creator)

2026-05-29 20:46:50,302 - julearn - INFO - Adding step jutargetpipeline that applies to ColumnTypes<types={'target'}; pattern=(?:target)>
2026-05-29 20:46:50,302 - julearn - INFO - Step added
2026-05-29 20:46:50,303 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous', 'categorical'}; pattern=(?:__:type:__continuous|__:type:__categorical)>
2026-05-29 20:46:50,303 - julearn - INFO - Step added
PipelineCreator:
  Step 0: target_jutargetpipeline
    estimator:     <julearn.pipeline.target_pipeline.JuTargetPipeline object at 0x11fb04a10>
    apply to:      ColumnTypes<types={'target'}; pattern=(?:target)>
    needed types:  ColumnTypes<types={'target'}; pattern=(?:target)>
    tuning params: {}
  Step 1: svm
    estimator:     SVR()
    apply to:      ColumnTypes<types={'continuous', 'categorical'}; pattern=(?:__:type:__continuous|__:type:__categorical)>
    needed types:  ColumnTypes<types={'continuous', 'categorical'}; pattern=(?:__:type:__continuous|__:type:__categorical)>
    tuning params: {}

This creator can then be passed to run_cross_validation():

scores = run_cross_validation(
    X=X, y=y, data=data_diabetes, X_types=X_types, model=creator
)

print(scores)

2026-05-29 20:46:50,305 - julearn - INFO - ==== Input Data ====
2026-05-29 20:46:50,305 - julearn - INFO - Using dataframe as input
2026-05-29 20:46:50,305 - julearn - INFO -      Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
2026-05-29 20:46:50,305 - julearn - INFO -      Target: target
2026-05-29 20:46:50,306 - julearn - INFO -      Expanded features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
2026-05-29 20:46:50,306 - julearn - INFO -      X_types:{'continuous': ['age', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], 'categorical': ['sex']}
2026-05-29 20:46:50,307 - julearn - INFO - ====================
2026-05-29 20:46:50,307 - julearn - INFO -
2026-05-29 20:46:50,308 - julearn - INFO - = Model Parameters =
2026-05-29 20:46:50,308 - julearn - INFO - ====================
2026-05-29 20:46:50,308 - julearn - INFO -
2026-05-29 20:46:50,308 - julearn - INFO - = Data Information =
2026-05-29 20:46:50,308 - julearn - INFO -      Problem type: regression
2026-05-29 20:46:50,309 - julearn - INFO -      Number of samples: 442
2026-05-29 20:46:50,309 - julearn - INFO -      Number of features: 10
2026-05-29 20:46:50,309 - julearn - INFO - ====================
2026-05-29 20:46:50,309 - julearn - INFO -
2026-05-29 20:46:50,309 - julearn - INFO -      Target type: float64
2026-05-29 20:46:50,309 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
   fit_time  score_time  test_score  n_train  n_test  repeat  fold                          cv_mdsum
0  0.015204    0.006190    0.340037      353      89       0     0  b10eef89b4192178d482d7a1587a248a
1  0.014747    0.007253    0.571525      353      89       0     1  b10eef89b4192178d482d7a1587a248a
2  0.017257    0.006601    0.444764      354      88       0     2  b10eef89b4192178d482d7a1587a248a
3  0.017414    0.006306    0.388669      354      88       0     3  b10eef89b4192178d482d7a1587a248a
4  0.016389    0.006972    0.520210      354      88       0     4  b10eef89b4192178d482d7a1587a248a

All transformers in (Transformers) can be used for both, feature and target transformations. However, features transformations can be directly specified as step in the PipelineCreator, while target transformations have to be specified using the TargetPipelineCreator, which is then passed to the overall PipelineCreator as an extra step.

Total running time of the script: (0 minutes 0.157 seconds)