Target Generation¶

This example uses the iris dataset and tests a regression model in which the target variable is generated from some features within the cross-validation procedure. We will use the Iris dataset and generate a target variable using PCA on the petal features. Then, we will evaluate if a regression model can predict the generated target from the sepal features

# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL

from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging

Set the logging level to info to see extra information.

configure_logging(level="DEBUG")

2026-05-29 20:46:12,581 - julearn - INFO - ===== Lib Versions =====
2026-05-29 20:46:12,581 - julearn - INFO - numpy: 2.4.6
2026-05-29 20:46:12,581 - julearn - INFO - scipy: 1.17.1
2026-05-29 20:46:12,581 - julearn - INFO - sklearn: 1.8.0
2026-05-29 20:46:12,582 - julearn - INFO - pandas: 3.0.3
2026-05-29 20:46:12,582 - julearn - INFO - julearn: 0.3.5
2026-05-29 20:46:12,582 - julearn - INFO - ========================

df_iris = load_dataset("iris")

As features, we will use the sepal length, width and petal length. We will try to predict the species.

X = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
y = "__generated__"  # to indicate to julearn that the target will be generated


# Define our feature types
X_types = {
    "sepal": ["sepal_length", "sepal_width"],
    "petal": ["petal_length", "petal_width"],
}

We now use a Pipeline Creator to create the pipeline that will generate the features. This special pipeline should be configured to be a “transformer” and apply to the “petal” feature types.

target_creator = PipelineCreator(problem_type="transformer", apply_to="petal")
target_creator.add("pca", n_components=2)
# Select only the first component
target_creator.add("pick_columns", keep="pca__pca0")

2026-05-29 20:46:12,585 - julearn - INFO - Adding step pca that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,585 - julearn - INFO - Setting hyperparameter n_components = 2
2026-05-29 20:46:12,586 - julearn - DEBUG - Getting estimator from string: pca
2026-05-29 20:46:12,586 - julearn - INFO - Step added
2026-05-29 20:46:12,586 - julearn - INFO - Adding step pick_columns that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,586 - julearn - INFO - Setting hyperparameter keep = pca__pca0
2026-05-29 20:46:12,586 - julearn - DEBUG - Getting estimator from string: pick_columns
2026-05-29 20:46:12,587 - julearn - INFO - Step added

<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x11ef45850>

We now create the pipeline that will be used to predict the target. This pipeline will be a regression pipeline. The step previous to the model should be the the generate_target, applying to the “petal” features and using the target_creator pipeline as the transformer.

creator = PipelineCreator(problem_type="regression")
creator.add("zscore", apply_to="*")
creator.add("generate_target", apply_to="petal", transformer=target_creator)
creator.add("linreg", apply_to="sepal")

2026-05-29 20:46:12,587 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'*'}; pattern=.*>
2026-05-29 20:46:12,587 - julearn - DEBUG - Getting estimator from string: zscore
2026-05-29 20:46:12,587 - julearn - INFO - Step added
2026-05-29 20:46:12,588 - julearn - INFO - Adding step generate_target that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,588 - julearn - INFO - Setting hyperparameter transformer = PipelineCreator:
  Step 0: pca
    estimator:     PCA(n_components=2)
    apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    needed types:  ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    tuning params: {}
  Step 1: pick_columns
    estimator:     PickColumns(keep='pca__pca0')
    apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    needed types:  ColumnTypes<types={'*'}; pattern=.*>
    tuning params: {}

2026-05-29 20:46:12,589 - julearn - DEBUG - Special step is generate_target
2026-05-29 20:46:12,589 - julearn - INFO - Step added
2026-05-29 20:46:12,589 - julearn - INFO - Adding step linreg that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2026-05-29 20:46:12,589 - julearn - DEBUG - Getting estimator from string: linreg
2026-05-29 20:46:12,590 - julearn - INFO - Step added

<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x11d84f750>

We finally evaluate the model within the cross validation.

scores, model = run_cross_validation(
    X=X,
    y=y,
    X_types=X_types,
    data=df_iris,
    model=creator,
    return_estimator="final",
    cv=2,
)

print(scores["test_score"])  # type: ignore

2026-05-29 20:46:12,590 - julearn - INFO - ==== Input Data ====
2026-05-29 20:46:12,591 - julearn - INFO - Using dataframe as input
2026-05-29 20:46:12,591 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-05-29 20:46:12,591 - julearn - INFO -      Target: __generated__
2026-05-29 20:46:12,591 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-05-29 20:46:12,591 - julearn - INFO -      X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
2026-05-29 20:46:12,593 - julearn - INFO - Target will be generated
2026-05-29 20:46:12,593 - julearn - INFO - ====================
2026-05-29 20:46:12,593 - julearn - INFO -
2026-05-29 20:46:12,593 - julearn - DEBUG - Generating pipeline from PipelineCreator or list of them
2026-05-29 20:46:12,593 - julearn - DEBUG - Creating pipeline
2026-05-29 20:46:12,593 - julearn - DEBUG - Ensuring target generator pipeline
2026-05-29 20:46:12,593 - julearn - DEBUG - Creating pipeline
2026-05-29 20:46:12,594 - julearn - DEBUG - Creating a pipeline with no model added
2026-05-29 20:46:12,594 - julearn - DEBUG - Adding transformer pca
2026-05-29 20:46:12,594 - julearn - DEBUG -      Estimator: PCA(n_components=2)
2026-05-29 20:46:12,594 - julearn - DEBUG -      Params to tune: {}
2026-05-29 20:46:12,594 - julearn - DEBUG - Adding transformer pick_columns
2026-05-29 20:46:12,595 - julearn - DEBUG -      Estimator: PickColumns(keep='pca__pca0')
2026-05-29 20:46:12,595 - julearn - DEBUG -      Params to tune: {}
2026-05-29 20:46:12,595 - julearn - INFO - = Model Parameters =
2026-05-29 20:46:12,595 - julearn - INFO - ====================
2026-05-29 20:46:12,595 - julearn - INFO -
2026-05-29 20:46:12,595 - julearn - DEBUG - Pipeline created
2026-05-29 20:46:12,596 - julearn - DEBUG - Target generator pipeline created
2026-05-29 20:46:12,596 - julearn - DEBUG - Adding transformer zscore
2026-05-29 20:46:12,596 - julearn - DEBUG -      Estimator: StandardScaler()
2026-05-29 20:46:12,596 - julearn - DEBUG -      Params to tune: {}
2026-05-29 20:46:12,596 - julearn - DEBUG - Adding model linreg
2026-05-29 20:46:12,597 - julearn - DEBUG - Wrapping linreg
2026-05-29 20:46:12,597 - julearn - DEBUG -      Estimator: WrapModel(apply_to=ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>,
          copy_X=True, fit_intercept=True, model=LinearRegression(),
          n_jobs=None, positive=False, tol=1e-06)
2026-05-29 20:46:12,598 - julearn - DEBUG -      Looking for nested pipeline creators
2026-05-29 20:46:12,598 - julearn - DEBUG -      Params to tune: {}
2026-05-29 20:46:12,598 - julearn - DEBUG - Wrapping target model linreg as target_generate
2026-05-29 20:46:12,598 - julearn - INFO - = Model Parameters =
2026-05-29 20:46:12,598 - julearn - INFO - ====================
2026-05-29 20:46:12,598 - julearn - INFO -
2026-05-29 20:46:12,598 - julearn - DEBUG - Pipeline created
2026-05-29 20:46:12,599 - julearn - DEBUG - Pipeline has target generator
2026-05-29 20:46:12,599 - julearn - INFO - = Data Information =
2026-05-29 20:46:12,599 - julearn - INFO -      Problem type: regression
2026-05-29 20:46:12,599 - julearn - INFO -      Number of samples: 150
2026-05-29 20:46:12,599 - julearn - INFO -      Number of features: 4
2026-05-29 20:46:12,599 - julearn - INFO - ====================
2026-05-29 20:46:12,599 - julearn - INFO -
2026-05-29 20:46:12,600 - julearn - INFO -      Target type: float64
2026-05-29 20:46:12,600 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
2026-05-29 20:46:12,611 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,612 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,617 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,618 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='str')
2026-05-29 20:46:12,618 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,632 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,641 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,642 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,643 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
    return estimator.score(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
    routed_params = process_routing(
                    ^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
    raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
2026-05-29 20:46:12,658 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,658 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,664 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,665 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='str')
2026-05-29 20:46:12,665 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,679 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,684 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,685 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,686 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
    return estimator.score(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
    routed_params = process_routing(
                    ^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
    raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
2026-05-29 20:46:12,701 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,701 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,707 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,708 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
       'petal_length__:type:__petal', 'petal_width__:type:__petal'],
      dtype='str')
2026-05-29 20:46:12,708 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,722 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,728 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,729 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,729 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
    scores = scorer(estimator, X_test, y_test, **score_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
    return estimator.score(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
    routed_params = process_routing(
                    ^^^^^^^^^^^^^^^^
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
    request_routing.validate_metadata(params=kwargs, method=_method)
  File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
    raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

  warnings.warn(
0   NaN
1   NaN
Name: test_score, dtype: float64

Total running time of the script: (0 minutes 0.173 seconds)

Gallery generated by Sphinx-Gallery