Note
Go to the end to download the full example code.
Target Generation¶
This example uses the iris dataset and tests a regression model in which
the target variable is generated from some features within the cross-validation
procedure. We will use the Iris dataset and generate a target variable using
PCA on the petal features. Then, we will evaluate if a regression model can
predict the generated target from the sepal features
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging
Set the logging level to info to see extra information.
configure_logging(level="DEBUG")
2026-05-29 20:46:12,581 - julearn - INFO - ===== Lib Versions =====
2026-05-29 20:46:12,581 - julearn - INFO - numpy: 2.4.6
2026-05-29 20:46:12,581 - julearn - INFO - scipy: 1.17.1
2026-05-29 20:46:12,581 - julearn - INFO - sklearn: 1.8.0
2026-05-29 20:46:12,582 - julearn - INFO - pandas: 3.0.3
2026-05-29 20:46:12,582 - julearn - INFO - julearn: 0.3.5
2026-05-29 20:46:12,582 - julearn - INFO - ========================
df_iris = load_dataset("iris")
As features, we will use the sepal length, width and petal length. We will try to predict the species.
We now use a Pipeline Creator to create the pipeline that will generate the features. This special pipeline should be configured to be a “transformer” and apply to the “petal” feature types.
target_creator = PipelineCreator(problem_type="transformer", apply_to="petal")
target_creator.add("pca", n_components=2)
# Select only the first component
target_creator.add("pick_columns", keep="pca__pca0")
2026-05-29 20:46:12,585 - julearn - INFO - Adding step pca that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,585 - julearn - INFO - Setting hyperparameter n_components = 2
2026-05-29 20:46:12,586 - julearn - DEBUG - Getting estimator from string: pca
2026-05-29 20:46:12,586 - julearn - INFO - Step added
2026-05-29 20:46:12,586 - julearn - INFO - Adding step pick_columns that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,586 - julearn - INFO - Setting hyperparameter keep = pca__pca0
2026-05-29 20:46:12,586 - julearn - DEBUG - Getting estimator from string: pick_columns
2026-05-29 20:46:12,587 - julearn - INFO - Step added
<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x11ef45850>
We now create the pipeline that will be used to predict the target. This pipeline will be a regression pipeline. The step previous to the model should be the the generate_target, applying to the “petal” features and using the target_creator pipeline as the transformer.
creator = PipelineCreator(problem_type="regression")
creator.add("zscore", apply_to="*")
creator.add("generate_target", apply_to="petal", transformer=target_creator)
creator.add("linreg", apply_to="sepal")
2026-05-29 20:46:12,587 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'*'}; pattern=.*>
2026-05-29 20:46:12,587 - julearn - DEBUG - Getting estimator from string: zscore
2026-05-29 20:46:12,587 - julearn - INFO - Step added
2026-05-29 20:46:12,588 - julearn - INFO - Adding step generate_target that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-05-29 20:46:12,588 - julearn - INFO - Setting hyperparameter transformer = PipelineCreator:
Step 0: pca
estimator: PCA(n_components=2)
apply to: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
needed types: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
tuning params: {}
Step 1: pick_columns
estimator: PickColumns(keep='pca__pca0')
apply to: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
needed types: ColumnTypes<types={'*'}; pattern=.*>
tuning params: {}
2026-05-29 20:46:12,589 - julearn - DEBUG - Special step is generate_target
2026-05-29 20:46:12,589 - julearn - INFO - Step added
2026-05-29 20:46:12,589 - julearn - INFO - Adding step linreg that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2026-05-29 20:46:12,589 - julearn - DEBUG - Getting estimator from string: linreg
2026-05-29 20:46:12,590 - julearn - INFO - Step added
<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x11d84f750>
We finally evaluate the model within the cross validation.
2026-05-29 20:46:12,590 - julearn - INFO - ==== Input Data ====
2026-05-29 20:46:12,591 - julearn - INFO - Using dataframe as input
2026-05-29 20:46:12,591 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-05-29 20:46:12,591 - julearn - INFO - Target: __generated__
2026-05-29 20:46:12,591 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-05-29 20:46:12,591 - julearn - INFO - X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
2026-05-29 20:46:12,593 - julearn - INFO - Target will be generated
2026-05-29 20:46:12,593 - julearn - INFO - ====================
2026-05-29 20:46:12,593 - julearn - INFO -
2026-05-29 20:46:12,593 - julearn - DEBUG - Generating pipeline from PipelineCreator or list of them
2026-05-29 20:46:12,593 - julearn - DEBUG - Creating pipeline
2026-05-29 20:46:12,593 - julearn - DEBUG - Ensuring target generator pipeline
2026-05-29 20:46:12,593 - julearn - DEBUG - Creating pipeline
2026-05-29 20:46:12,594 - julearn - DEBUG - Creating a pipeline with no model added
2026-05-29 20:46:12,594 - julearn - DEBUG - Adding transformer pca
2026-05-29 20:46:12,594 - julearn - DEBUG - Estimator: PCA(n_components=2)
2026-05-29 20:46:12,594 - julearn - DEBUG - Params to tune: {}
2026-05-29 20:46:12,594 - julearn - DEBUG - Adding transformer pick_columns
2026-05-29 20:46:12,595 - julearn - DEBUG - Estimator: PickColumns(keep='pca__pca0')
2026-05-29 20:46:12,595 - julearn - DEBUG - Params to tune: {}
2026-05-29 20:46:12,595 - julearn - INFO - = Model Parameters =
2026-05-29 20:46:12,595 - julearn - INFO - ====================
2026-05-29 20:46:12,595 - julearn - INFO -
2026-05-29 20:46:12,595 - julearn - DEBUG - Pipeline created
2026-05-29 20:46:12,596 - julearn - DEBUG - Target generator pipeline created
2026-05-29 20:46:12,596 - julearn - DEBUG - Adding transformer zscore
2026-05-29 20:46:12,596 - julearn - DEBUG - Estimator: StandardScaler()
2026-05-29 20:46:12,596 - julearn - DEBUG - Params to tune: {}
2026-05-29 20:46:12,596 - julearn - DEBUG - Adding model linreg
2026-05-29 20:46:12,597 - julearn - DEBUG - Wrapping linreg
2026-05-29 20:46:12,597 - julearn - DEBUG - Estimator: WrapModel(apply_to=ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>,
copy_X=True, fit_intercept=True, model=LinearRegression(),
n_jobs=None, positive=False, tol=1e-06)
2026-05-29 20:46:12,598 - julearn - DEBUG - Looking for nested pipeline creators
2026-05-29 20:46:12,598 - julearn - DEBUG - Params to tune: {}
2026-05-29 20:46:12,598 - julearn - DEBUG - Wrapping target model linreg as target_generate
2026-05-29 20:46:12,598 - julearn - INFO - = Model Parameters =
2026-05-29 20:46:12,598 - julearn - INFO - ====================
2026-05-29 20:46:12,598 - julearn - INFO -
2026-05-29 20:46:12,598 - julearn - DEBUG - Pipeline created
2026-05-29 20:46:12,599 - julearn - DEBUG - Pipeline has target generator
2026-05-29 20:46:12,599 - julearn - INFO - = Data Information =
2026-05-29 20:46:12,599 - julearn - INFO - Problem type: regression
2026-05-29 20:46:12,599 - julearn - INFO - Number of samples: 150
2026-05-29 20:46:12,599 - julearn - INFO - Number of features: 4
2026-05-29 20:46:12,599 - julearn - INFO - ====================
2026-05-29 20:46:12,599 - julearn - INFO -
2026-05-29 20:46:12,600 - julearn - INFO - Target type: float64
2026-05-29 20:46:12,600 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
2026-05-29 20:46:12,611 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,612 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,617 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,618 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='str')
2026-05-29 20:46:12,618 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,632 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,641 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,642 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,643 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
return estimator.score(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
routed_params = process_routing(
^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
2026-05-29 20:46:12,658 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,658 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,664 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,665 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='str')
2026-05-29 20:46:12,665 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,679 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,684 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,685 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,686 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
return estimator.score(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
routed_params = process_routing(
^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
2026-05-29 20:46:12,701 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
2026-05-29 20:46:12,701 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,707 - julearn - DEBUG - Fitting the target generator
2026-05-29 20:46:12,708 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='str')
2026-05-29 20:46:12,708 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-05-29 20:46:12,722 - julearn - DEBUG - Generating target
2026-05-29 20:46:12,728 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-05-29 20:46:12,729 - julearn - DEBUG - Target generated: pca__pca0
2026-05-29 20:46:12,729 - julearn - DEBUG - Fitting model from generated target
/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
return estimator.score(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1138, in score
routed_params = process_routing(
^^^^^^^^^^^^^^^^
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
File "/private/var/folders/09/t22x2_p106j7p24khr0jdxrw0000gn/T/tmpyvhr0tue/.venv/lib/python3.11/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
raise TypeError(
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
0 NaN
1 NaN
Name: test_score, dtype: float64
Total running time of the script: (0 minutes 0.173 seconds)