Target Generation¶

This example uses the iris dataset and tests a regression model in which the target variable is generated from some features within the cross-validation procedure. We will use the Iris dataset and generate a target variable using PCA on the petal features. Then, we will evaluate if a regression model can predict the generated target from the sepal features

# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL

from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging

Set the logging level to info to see extra information.

configure_logging(level="DEBUG")

2026-02-10 14:37:57,330 - julearn - INFO - ===== Lib Versions =====
2026-02-10 14:37:57,331 - julearn - INFO - numpy: 1.26.4
2026-02-10 14:37:57,331 - julearn - INFO - scipy: 1.17.0
2026-02-10 14:37:57,331 - julearn - INFO - sklearn: 1.7.2
2026-02-10 14:37:57,331 - julearn - INFO - pandas: 2.3.3
2026-02-10 14:37:57,331 - julearn - INFO - julearn: 0.3.5.dev126
2026-02-10 14:37:57,331 - julearn - INFO - ========================

df_iris = load_dataset("iris")

As features, we will use the sepal length, width and petal length. We will try to predict the species.

X = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
y = "__generated__"  # to indicate to julearn that the target will be generated


# Define our feature types
X_types = {
    "sepal": ["sepal_length", "sepal_width"],
    "petal": ["petal_length", "petal_width"],
}

We now use a Pipeline Creator to create the pipeline that will generate the features. This special pipeline should be configured to be a “transformer” and apply to the “petal” feature types.

target_creator = PipelineCreator(problem_type="transformer", apply_to="petal")
target_creator.add("pca", n_components=2)
# Select only the first component
target_creator.add("pick_columns", keep="pca__pca0")

2026-02-10 14:37:57,333 - julearn - INFO - Adding step pca that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-02-10 14:37:57,333 - julearn - INFO - Setting hyperparameter n_components = 2
2026-02-10 14:37:57,333 - julearn - DEBUG - Getting estimator from string: pca
2026-02-10 14:37:57,334 - julearn - INFO - Step added
2026-02-10 14:37:57,334 - julearn - INFO - Adding step pick_columns that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-02-10 14:37:57,334 - julearn - INFO - Setting hyperparameter keep = pca__pca0
2026-02-10 14:37:57,334 - julearn - DEBUG - Getting estimator from string: pick_columns
2026-02-10 14:37:57,334 - julearn - INFO - Step added

<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f8fac3b00d0>

We now create the pipeline that will be used to predict the target. This pipeline will be a regression pipeline. The step previous to the model should be the the generate_target, applying to the “petal” features and using the target_creator pipeline as the transformer.

creator = PipelineCreator(problem_type="regression")
creator.add("zscore", apply_to="*")
creator.add("generate_target", apply_to="petal", transformer=target_creator)
creator.add("linreg", apply_to="sepal")

2026-02-10 14:37:57,335 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'*'}; pattern=.*>
2026-02-10 14:37:57,335 - julearn - DEBUG - Getting estimator from string: zscore
2026-02-10 14:37:57,335 - julearn - INFO - Step added
2026-02-10 14:37:57,335 - julearn - INFO - Adding step generate_target that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-02-10 14:37:57,336 - julearn - INFO - Setting hyperparameter transformer = PipelineCreator:
  Step 0: pca
    estimator:     PCA(n_components=2)
    apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    needed types:  ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    tuning params: {}
  Step 1: pick_columns
    estimator:     PickColumns(keep='pca__pca0')
    apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    needed types:  ColumnTypes<types={'*'}; pattern=.*>
    tuning params: {}

2026-02-10 14:37:57,336 - julearn - DEBUG - Special step is generate_target
2026-02-10 14:37:57,336 - julearn - INFO - Step added
2026-02-10 14:37:57,336 - julearn - INFO - Adding step linreg that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2026-02-10 14:37:57,336 - julearn - DEBUG - Getting estimator from string: linreg
2026-02-10 14:37:57,336 - julearn - INFO - Step added

<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f8fac860ad0>

We finally evaluate the model within the cross validation.

scores, model = run_cross_validation(
    X=X,
    y=y,
    X_types=X_types,
    data=df_iris,
    model=creator,
    return_estimator="final",
    cv=2,
)

print(scores["test_score"])  # type: ignore

2026-02-10 14:37:57,337 - julearn - INFO - ==== Input Data ====
2026-02-10 14:37:57,337 - julearn - INFO - Using dataframe as input
2026-02-10 14:37:57,337 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-02-10 14:37:57,337 - julearn - INFO -      Target: __generated__
2026-02-10 14:37:57,338 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-02-10 14:37:57,338 - julearn - INFO -      X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
2026-02-10 14:37:57,338 - julearn - INFO - Target will be generated
2026-02-10 14:37:57,338 - julearn - INFO - ====================
2026-02-10 14:37:57,339 - julearn - INFO -
2026-02-10 14:37:57,339 - julearn - DEBUG - Generating pipeline from PipelineCreator or list of them
2026-02-10 14:37:57,339 - julearn - DEBUG - Creating pipeline
2026-02-10 14:37:57,339 - julearn - DEBUG - Ensuring target generator pipeline
2026-02-10 14:37:57,339 - julearn - DEBUG - Creating pipeline
2026-02-10 14:37:57,339 - julearn - DEBUG - Creating a pipeline with no model added
2026-02-10 14:37:57,339 - julearn - DEBUG - Adding transformer pca
2026-02-10 14:37:57,339 - julearn - DEBUG -      Estimator: PCA(n_components=2)
2026-02-10 14:37:57,340 - julearn - DEBUG -      Params to tune: {}
2026-02-10 14:37:57,340 - julearn - DEBUG - Adding transformer pick_columns
2026-02-10 14:37:57,340 - julearn - DEBUG -      Estimator: PickColumns(keep='pca__pca0')
2026-02-10 14:37:57,340 - julearn - DEBUG -      Params to tune: {}
2026-02-10 14:37:57,340 - julearn - INFO - = Model Parameters =
2026-02-10 14:37:57,340 - julearn - INFO - ====================
2026-02-10 14:37:57,340 - julearn - INFO -
2026-02-10 14:37:57,340 - julearn - DEBUG - Pipeline created
2026-02-10 14:37:57,341 - julearn - DEBUG - Target generator pipeline created
2026-02-10 14:37:57,341 - julearn - DEBUG - Adding transformer zscore
2026-02-10 14:37:57,341 - julearn - DEBUG -      Estimator: StandardScaler()
2026-02-10 14:37:57,341 - julearn - DEBUG -      Params to tune: {}
2026-02-10 14:37:57,341 - julearn - DEBUG - Adding model linreg
2026-02-10 14:37:57,341 - julearn - DEBUG - Wrapping linreg
2026-02-10 14:37:57,342 - julearn - DEBUG -      Estimator: WrapModel(apply_to=ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>,
          copy_X=True, fit_intercept=True, model=LinearRegression(),
          n_jobs=None, positive=False, tol=1e-06)
2026-02-10 14:37:57,342 - julearn - DEBUG -      Looking for nested pipeline creators
2026-02-10 14:37:57,342 - julearn - DEBUG -      Params to tune: {}
2026-02-10 14:37:57,342 - julearn - DEBUG - Wrapping target model linreg as target_generate
2026-02-10 14:37:57,342 - julearn - INFO - = Model Parameters =
2026-02-10 14:37:57,342 - julearn - INFO - ====================
2026-02-10 14:37:57,342 - julearn - INFO -
2026-02-10 14:37:57,343 - julearn - DEBUG - Pipeline created
2026-02-10 14:37:57,343 - julearn - DEBUG - Pipeline has target generator
2026-02-10 14:37:57,343 - julearn - INFO - = Data Information =
2026-02-10 14:37:57,343 - julearn - INFO -      Problem type: regression
2026-02-10 14:37:57,343 - julearn - INFO -      Number of samples: 150
2026-02-10 14:37:57,343 - julearn - INFO -      Number of features: 4
2026-02-10 14:37:57,343 - julearn - INFO - ====================
2026-02-10 14:37:57,343 - julearn - INFO -
2026-02-10 14:37:57,343 - julearn - INFO -      Target type: float64
2026-02-10 14:37:57,344 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
2026-02-10 14:37:57,348 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-02-10 14:37:57,349 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,352 - julearn - DEBUG - Fitting the target generator
2026-02-10 14:37:57,352 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='object')
2026-02-10 14:37:57,352 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,360 - julearn - DEBUG - Generating target
2026-02-10 14:37:57,363 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-02-10 14:37:57,363 - julearn - DEBUG - Target generated: pca__pca0
2026-02-10 14:37:57,364 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
    return estimator.score(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
    routed_params = process_routing(
        self, "score", sample_weight=sample_weight, **params
    )
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
    raise TypeError(
    ...<2 lines>...
    )
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
2026-02-10 14:37:57,372 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-02-10 14:37:57,372 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,375 - julearn - DEBUG - Fitting the target generator
2026-02-10 14:37:57,376 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='object')
2026-02-10 14:37:57,376 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,383 - julearn - DEBUG - Generating target
2026-02-10 14:37:57,385 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-02-10 14:37:57,386 - julearn - DEBUG - Target generated: pca__pca0
2026-02-10 14:37:57,386 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
    return estimator.score(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
    routed_params = process_routing(
        self, "score", sample_weight=sample_weight, **params
    )
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
    raise TypeError(
    ...<2 lines>...
    )
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
2026-02-10 14:37:57,393 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-02-10 14:37:57,393 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,396 - julearn - DEBUG - Fitting the target generator
2026-02-10 14:37:57,396 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='object')
2026-02-10 14:37:57,397 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-02-10 14:37:57,404 - julearn - DEBUG - Generating target
2026-02-10 14:37:57,406 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-02-10 14:37:57,407 - julearn - DEBUG - Target generated: pca__pca0
2026-02-10 14:37:57,407 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
    return estimator.score(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
    routed_params = process_routing(
        self, "score", sample_weight=sample_weight, **params
    )
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
    raise TypeError(
    ...<2 lines>...
    )
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
0   NaN
1   NaN
Name: test_score, dtype: float64

Total running time of the script: (0 minutes 0.085 seconds)

Gallery generated by Sphinx-Gallery