Note
Go to the end to download the full example code.
Target Generation¶
This example uses the iris dataset and tests a regression model in which
the target variable is generated from some features within the cross-validation
procedure. We will use the Iris dataset and generate a target variable using
PCA on the petal features. Then, we will evaluate if a regression model can
predict the generated target from the sepal features
# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging
Set the logging level to info to see extra information.
configure_logging(level="DEBUG")
2026-01-16 10:54:05,223 - julearn - INFO - ===== Lib Versions =====
2026-01-16 10:54:05,223 - julearn - INFO - numpy: 1.26.4
2026-01-16 10:54:05,224 - julearn - INFO - scipy: 1.17.0
2026-01-16 10:54:05,224 - julearn - INFO - sklearn: 1.7.2
2026-01-16 10:54:05,224 - julearn - INFO - pandas: 2.3.3
2026-01-16 10:54:05,224 - julearn - INFO - julearn: 0.3.5.dev123
2026-01-16 10:54:05,224 - julearn - INFO - ========================
df_iris = load_dataset("iris")
As features, we will use the sepal length, width and petal length. We will try to predict the species.
We now use a Pipeline Creator to create the pipeline that will generate the features. This special pipeline should be configured to be a “transformer” and apply to the “petal” feature types.
target_creator = PipelineCreator(problem_type="transformer", apply_to="petal")
target_creator.add("pca", n_components=2)
# Select only the first component
target_creator.add("pick_columns", keep="pca__pca0")
2026-01-16 10:54:05,226 - julearn - INFO - Adding step pca that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-01-16 10:54:05,226 - julearn - INFO - Setting hyperparameter n_components = 2
2026-01-16 10:54:05,226 - julearn - DEBUG - Getting estimator from string: pca
2026-01-16 10:54:05,226 - julearn - INFO - Step added
2026-01-16 10:54:05,226 - julearn - INFO - Adding step pick_columns that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-01-16 10:54:05,227 - julearn - INFO - Setting hyperparameter keep = pca__pca0
2026-01-16 10:54:05,227 - julearn - DEBUG - Getting estimator from string: pick_columns
2026-01-16 10:54:05,227 - julearn - INFO - Step added
<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f2d83be61d0>
We now create the pipeline that will be used to predict the target. This pipeline will be a regression pipeline. The step previous to the model should be the the generate_target, applying to the “petal” features and using the target_creator pipeline as the transformer.
creator = PipelineCreator(problem_type="regression")
creator.add("zscore", apply_to="*")
creator.add("generate_target", apply_to="petal", transformer=target_creator)
creator.add("linreg", apply_to="sepal")
2026-01-16 10:54:05,227 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'*'}; pattern=.*>
2026-01-16 10:54:05,227 - julearn - DEBUG - Getting estimator from string: zscore
2026-01-16 10:54:05,228 - julearn - INFO - Step added
2026-01-16 10:54:05,228 - julearn - INFO - Adding step generate_target that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2026-01-16 10:54:05,228 - julearn - INFO - Setting hyperparameter transformer = PipelineCreator:
Step 0: pca
estimator: PCA(n_components=2)
apply to: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
needed types: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
tuning params: {}
Step 1: pick_columns
estimator: PickColumns(keep='pca__pca0')
apply to: ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
needed types: ColumnTypes<types={'*'}; pattern=.*>
tuning params: {}
2026-01-16 10:54:05,228 - julearn - DEBUG - Special step is generate_target
2026-01-16 10:54:05,228 - julearn - INFO - Step added
2026-01-16 10:54:05,229 - julearn - INFO - Adding step linreg that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2026-01-16 10:54:05,229 - julearn - DEBUG - Getting estimator from string: linreg
2026-01-16 10:54:05,229 - julearn - INFO - Step added
<julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f2d82822250>
We finally evaluate the model within the cross validation.
2026-01-16 10:54:05,229 - julearn - INFO - ==== Input Data ====
2026-01-16 10:54:05,229 - julearn - INFO - Using dataframe as input
2026-01-16 10:54:05,230 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-01-16 10:54:05,230 - julearn - INFO - Target: __generated__
2026-01-16 10:54:05,230 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2026-01-16 10:54:05,230 - julearn - INFO - X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
2026-01-16 10:54:05,231 - julearn - INFO - Target will be generated
2026-01-16 10:54:05,231 - julearn - INFO - ====================
2026-01-16 10:54:05,231 - julearn - INFO -
2026-01-16 10:54:05,231 - julearn - DEBUG - Generating pipeline from PipelineCreator or list of them
2026-01-16 10:54:05,231 - julearn - DEBUG - Creating pipeline
2026-01-16 10:54:05,231 - julearn - DEBUG - Ensuring target generator pipeline
2026-01-16 10:54:05,231 - julearn - DEBUG - Creating pipeline
2026-01-16 10:54:05,231 - julearn - DEBUG - Creating a pipeline with no model added
2026-01-16 10:54:05,232 - julearn - DEBUG - Adding transformer pca
2026-01-16 10:54:05,232 - julearn - DEBUG - Estimator: PCA(n_components=2)
2026-01-16 10:54:05,232 - julearn - DEBUG - Params to tune: {}
2026-01-16 10:54:05,232 - julearn - DEBUG - Adding transformer pick_columns
2026-01-16 10:54:05,232 - julearn - DEBUG - Estimator: PickColumns(keep='pca__pca0')
2026-01-16 10:54:05,232 - julearn - DEBUG - Params to tune: {}
2026-01-16 10:54:05,233 - julearn - INFO - = Model Parameters =
2026-01-16 10:54:05,233 - julearn - INFO - ====================
2026-01-16 10:54:05,233 - julearn - INFO -
2026-01-16 10:54:05,233 - julearn - DEBUG - Pipeline created
2026-01-16 10:54:05,233 - julearn - DEBUG - Target generator pipeline created
2026-01-16 10:54:05,233 - julearn - DEBUG - Adding transformer zscore
2026-01-16 10:54:05,233 - julearn - DEBUG - Estimator: StandardScaler()
2026-01-16 10:54:05,233 - julearn - DEBUG - Params to tune: {}
2026-01-16 10:54:05,233 - julearn - DEBUG - Adding model linreg
2026-01-16 10:54:05,234 - julearn - DEBUG - Wrapping linreg
2026-01-16 10:54:05,234 - julearn - DEBUG - Estimator: WrapModel(apply_to=ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>,
copy_X=True, fit_intercept=True, model=LinearRegression(),
n_jobs=None, positive=False, tol=1e-06)
2026-01-16 10:54:05,234 - julearn - DEBUG - Looking for nested pipeline creators
2026-01-16 10:54:05,235 - julearn - DEBUG - Params to tune: {}
2026-01-16 10:54:05,235 - julearn - DEBUG - Wrapping target model linreg as target_generate
2026-01-16 10:54:05,235 - julearn - INFO - = Model Parameters =
2026-01-16 10:54:05,235 - julearn - INFO - ====================
2026-01-16 10:54:05,235 - julearn - INFO -
2026-01-16 10:54:05,235 - julearn - DEBUG - Pipeline created
2026-01-16 10:54:05,235 - julearn - DEBUG - Pipeline has target generator
2026-01-16 10:54:05,235 - julearn - INFO - = Data Information =
2026-01-16 10:54:05,235 - julearn - INFO - Problem type: regression
2026-01-16 10:54:05,235 - julearn - INFO - Number of samples: 150
2026-01-16 10:54:05,235 - julearn - INFO - Number of features: 4
2026-01-16 10:54:05,236 - julearn - INFO - ====================
2026-01-16 10:54:05,236 - julearn - INFO -
2026-01-16 10:54:05,236 - julearn - INFO - Target type: float64
2026-01-16 10:54:05,236 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
2026-01-16 10:54:05,241 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-01-16 10:54:05,241 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,244 - julearn - DEBUG - Fitting the target generator
2026-01-16 10:54:05,245 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='object')
2026-01-16 10:54:05,245 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,253 - julearn - DEBUG - Generating target
2026-01-16 10:54:05,256 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-01-16 10:54:05,256 - julearn - DEBUG - Target generated: pca__pca0
2026-01-16 10:54:05,257 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
return estimator.score(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
routed_params = process_routing(
self, "score", sample_weight=sample_weight, **params
)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
raise TypeError(
...<2 lines>...
)
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
2026-01-16 10:54:05,265 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-01-16 10:54:05,265 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,268 - julearn - DEBUG - Fitting the target generator
2026-01-16 10:54:05,269 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='object')
2026-01-16 10:54:05,269 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,276 - julearn - DEBUG - Generating target
2026-01-16 10:54:05,279 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-01-16 10:54:05,279 - julearn - DEBUG - Target generated: pca__pca0
2026-01-16 10:54:05,280 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
return estimator.score(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
routed_params = process_routing(
self, "score", sample_weight=sample_weight, **params
)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
raise TypeError(
...<2 lines>...
)
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
2026-01-16 10:54:05,286 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')
2026-01-16 10:54:05,286 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,289 - julearn - DEBUG - Fitting the target generator
2026-01-16 10:54:05,290 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
'petal_length__:type:__petal', 'petal_width__:type:__petal'],
dtype='object')
2026-01-16 10:54:05,290 - julearn - DEBUG - Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
2026-01-16 10:54:05,297 - julearn - DEBUG - Generating target
2026-01-16 10:54:05,300 - julearn - DEBUG - Picking columns: ['pca__pca0']
2026-01-16 10:54:05,300 - julearn - DEBUG - Target generated: pca__pca0
2026-01-16 10:54:05,301 - julearn - DEBUG - Fitting model from generated target
/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:953: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 942, in _score
scores = scorer(estimator, X_test, y_test, **score_params)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 492, in __call__
return estimator.score(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1192, in score
routed_params = process_routing(
self, "score", sample_weight=sample_weight, **params
)
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1625, in process_routing
request_routing.validate_metadata(params=kwargs, method=_method)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1109, in validate_metadata
raise TypeError(
...<2 lines>...
)
TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.
warnings.warn(
0 NaN
1 NaN
Name: test_score, dtype: float64
Total running time of the script: (0 minutes 0.087 seconds)