.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/03_complex_models/run_generate_target.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_03_complex_models_run_generate_target.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_03_complex_models_run_generate_target.py:


Target Generation
=================

This example uses the ``iris`` dataset and tests a regression model in which
the target variable is generated from some features within the cross-validation
procedure. We will use the Iris dataset and generate a target variable using
PCA on the petal features. Then, we will evaluate if a regression model can
predict the generated target from the sepal features

.. include:: ../../links.inc

.. GENERATED FROM PYTHON SOURCE LINES 13-21

.. code-block:: Python

    # Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
    # License: AGPL

    from seaborn import load_dataset
    from julearn import run_cross_validation
    from julearn.pipeline import PipelineCreator
    from julearn.utils import configure_logging


.. GENERATED FROM PYTHON SOURCE LINES 22-23

Set the logging level to info to see extra information.

.. GENERATED FROM PYTHON SOURCE LINES 23-25

.. code-block:: Python

    configure_logging(level="DEBUG")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-03-31 12:29:04,011 - julearn - INFO - ===== Lib Versions =====
    2026-03-31 12:29:04,012 - julearn - INFO - numpy: 2.4.4
    2026-03-31 12:29:04,012 - julearn - INFO - scipy: 1.17.1
    2026-03-31 12:29:04,012 - julearn - INFO - sklearn: 1.8.0
    2026-03-31 12:29:04,012 - julearn - INFO - pandas: 3.0.2
    2026-03-31 12:29:04,012 - julearn - INFO - julearn: 0.3.6.dev15
    2026-03-31 12:29:04,012 - julearn - INFO - ========================


.. GENERATED FROM PYTHON SOURCE LINES 26-29

.. code-block:: Python

    df_iris = load_dataset("iris")


.. GENERATED FROM PYTHON SOURCE LINES 30-32

As features, we will use the sepal length, width and petal length.
We will try to predict the species.

.. GENERATED FROM PYTHON SOURCE LINES 32-43

.. code-block:: Python


    X = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
    y = "__generated__"  # to indicate to julearn that the target will be generated


    # Define our feature types
    X_types = {
        "sepal": ["sepal_length", "sepal_width"],
        "petal": ["petal_length", "petal_width"],
    }


.. GENERATED FROM PYTHON SOURCE LINES 44-47

We now use a Pipeline Creator to create the pipeline that will generate the
features. This special pipeline should be configured to be a "transformer"
and apply to the "petal" feature types.

.. GENERATED FROM PYTHON SOURCE LINES 47-54

.. code-block:: Python


    target_creator = PipelineCreator(problem_type="transformer", apply_to="petal")
    target_creator.add("pca", n_components=2)
    # Select only the first component
    target_creator.add("pick_columns", keep="pca__pca0")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-03-31 12:29:04,014 - julearn - INFO - Adding step pca that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    2026-03-31 12:29:04,014 - julearn - INFO - Setting hyperparameter n_components = 2
    2026-03-31 12:29:04,014 - julearn - DEBUG - Getting estimator from string: pca
    2026-03-31 12:29:04,014 - julearn - INFO - Step added
    2026-03-31 12:29:04,015 - julearn - INFO - Adding step pick_columns that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    2026-03-31 12:29:04,015 - julearn - INFO - Setting hyperparameter keep = pca__pca0
    2026-03-31 12:29:04,015 - julearn - DEBUG - Getting estimator from string: pick_columns
    2026-03-31 12:29:04,015 - julearn - INFO - Step added

    <julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7ff939b27850>


.. GENERATED FROM PYTHON SOURCE LINES 55-59

We now create the pipeline that will be used to predict the target. This
pipeline will be a regression pipeline. The step previous to the model should
be the the `generate_target`, applying to the "petal" features and using the
target_creator pipeline as the transformer.

.. GENERATED FROM PYTHON SOURCE LINES 59-64

.. code-block:: Python

    creator = PipelineCreator(problem_type="regression")
    creator.add("zscore", apply_to="*")
    creator.add("generate_target", apply_to="petal", transformer=target_creator)
    creator.add("linreg", apply_to="sepal")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-03-31 12:29:04,016 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'*'}; pattern=.*>
    2026-03-31 12:29:04,016 - julearn - DEBUG - Getting estimator from string: zscore
    2026-03-31 12:29:04,016 - julearn - INFO - Step added
    2026-03-31 12:29:04,016 - julearn - INFO - Adding step generate_target that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
    2026-03-31 12:29:04,016 - julearn - INFO - Setting hyperparameter transformer = PipelineCreator:
      Step 0: pca
        estimator:     PCA(n_components=2)
        apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
        needed types:  ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
        tuning params: {}
      Step 1: pick_columns
        estimator:     PickColumns(keep='pca__pca0')
        apply to:      ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
        needed types:  ColumnTypes<types={'*'}; pattern=.*>
        tuning params: {}

    2026-03-31 12:29:04,017 - julearn - DEBUG - Special step is generate_target
    2026-03-31 12:29:04,017 - julearn - INFO - Step added
    2026-03-31 12:29:04,017 - julearn - INFO - Adding step linreg that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
    2026-03-31 12:29:04,017 - julearn - DEBUG - Getting estimator from string: linreg
    2026-03-31 12:29:04,018 - julearn - INFO - Step added

    <julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7ff939b259d0>


.. GENERATED FROM PYTHON SOURCE LINES 65-66

We finally evaluate the model within the cross validation.

.. GENERATED FROM PYTHON SOURCE LINES 66-77

.. code-block:: Python

    scores, model = run_cross_validation(
        X=X,
        y=y,
        X_types=X_types,
        data=df_iris,
        model=creator,
        return_estimator="final",
        cv=2,
    )

    print(scores["test_score"])  # type: ignore


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-03-31 12:29:04,018 - julearn - INFO - ==== Input Data ====
    2026-03-31 12:29:04,018 - julearn - INFO - Using dataframe as input
    2026-03-31 12:29:04,018 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    2026-03-31 12:29:04,018 - julearn - INFO -      Target: __generated__
    2026-03-31 12:29:04,019 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    2026-03-31 12:29:04,019 - julearn - INFO -      X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
    2026-03-31 12:29:04,020 - julearn - INFO - Target will be generated
    2026-03-31 12:29:04,020 - julearn - INFO - ====================
    2026-03-31 12:29:04,020 - julearn - INFO - 
    2026-03-31 12:29:04,020 - julearn - DEBUG - Generating pipeline from PipelineCreator or list of them
    2026-03-31 12:29:04,020 - julearn - DEBUG - Creating pipeline
    2026-03-31 12:29:04,020 - julearn - DEBUG - Ensuring target generator pipeline
    2026-03-31 12:29:04,020 - julearn - DEBUG - Creating pipeline
    2026-03-31 12:29:04,020 - julearn - DEBUG - Creating a pipeline with no model added
    2026-03-31 12:29:04,020 - julearn - DEBUG - Adding transformer pca
    2026-03-31 12:29:04,021 - julearn - DEBUG -      Estimator: PCA(n_components=2)
    2026-03-31 12:29:04,021 - julearn - DEBUG -      Params to tune: {}
    2026-03-31 12:29:04,021 - julearn - DEBUG - Adding transformer pick_columns
    2026-03-31 12:29:04,021 - julearn - DEBUG -      Estimator: PickColumns(keep='pca__pca0')
    2026-03-31 12:29:04,022 - julearn - DEBUG -      Params to tune: {}
    2026-03-31 12:29:04,022 - julearn - INFO - = Model Parameters =
    2026-03-31 12:29:04,022 - julearn - INFO - ====================
    2026-03-31 12:29:04,022 - julearn - INFO - 
    2026-03-31 12:29:04,022 - julearn - DEBUG - Pipeline created
    2026-03-31 12:29:04,022 - julearn - DEBUG - Target generator pipeline created
    2026-03-31 12:29:04,022 - julearn - DEBUG - Adding transformer zscore
    2026-03-31 12:29:04,022 - julearn - DEBUG -      Estimator: StandardScaler()
    2026-03-31 12:29:04,023 - julearn - DEBUG -      Params to tune: {}
    2026-03-31 12:29:04,023 - julearn - DEBUG - Adding model linreg
    2026-03-31 12:29:04,023 - julearn - DEBUG - Wrapping linreg
    2026-03-31 12:29:04,023 - julearn - DEBUG -      Estimator: WrapModel(apply_to=ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>,
              copy_X=True, fit_intercept=True, model=LinearRegression(),
              n_jobs=None, positive=False, tol=1e-06)
    2026-03-31 12:29:04,024 - julearn - DEBUG -      Looking for nested pipeline creators
    2026-03-31 12:29:04,024 - julearn - DEBUG -      Params to tune: {}
    2026-03-31 12:29:04,024 - julearn - DEBUG - Wrapping target model linreg as target_generate
    2026-03-31 12:29:04,024 - julearn - INFO - = Model Parameters =
    2026-03-31 12:29:04,024 - julearn - INFO - ====================
    2026-03-31 12:29:04,024 - julearn - INFO - 
    2026-03-31 12:29:04,025 - julearn - DEBUG - Pipeline created
    2026-03-31 12:29:04,025 - julearn - DEBUG - Pipeline has target generator
    2026-03-31 12:29:04,025 - julearn - INFO - = Data Information =
    2026-03-31 12:29:04,025 - julearn - INFO -      Problem type: regression
    2026-03-31 12:29:04,025 - julearn - INFO -      Number of samples: 150
    2026-03-31 12:29:04,025 - julearn - INFO -      Number of features: 4
    2026-03-31 12:29:04,025 - julearn - INFO - ====================
    2026-03-31 12:29:04,025 - julearn - INFO - 
    2026-03-31 12:29:04,025 - julearn - INFO -      Target type: float64
    2026-03-31 12:29:04,025 - julearn - INFO - Using outer CV scheme KFold(n_splits=2, random_state=None, shuffle=False) (incl. final model)
    2026-03-31 12:29:04,033 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
    2026-03-31 12:29:04,033 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,037 - julearn - DEBUG - Fitting the target generator
    2026-03-31 12:29:04,038 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
           'petal_length__:type:__petal', 'petal_width__:type:__petal'],
          dtype='str')
    2026-03-31 12:29:04,038 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,047 - julearn - DEBUG - Generating target
    2026-03-31 12:29:04,050 - julearn - DEBUG - Picking columns: ['pca__pca0']
    2026-03-31 12:29:04,050 - julearn - DEBUG - Target generated: pca__pca0
    2026-03-31 12:29:04,051 - julearn - DEBUG - Fitting model from generated target
    /opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
        scores = scorer(estimator, X_test, y_test, **score_params)
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
        return estimator.score(*args, **kwargs)
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1138, in score
        routed_params = process_routing(
            self, "score", sample_weight=sample_weight, **params
        )
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
        request_routing.validate_metadata(params=kwargs, method=_method)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
        raise TypeError(
        ...<2 lines>...
        )
    TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

      warnings.warn(
    2026-03-31 12:29:04,104 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
    2026-03-31 12:29:04,105 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,109 - julearn - DEBUG - Fitting the target generator
    2026-03-31 12:29:04,109 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
           'petal_length__:type:__petal', 'petal_width__:type:__petal'],
          dtype='str')
    2026-03-31 12:29:04,109 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,118 - julearn - DEBUG - Generating target
    2026-03-31 12:29:04,121 - julearn - DEBUG - Picking columns: ['pca__pca0']
    2026-03-31 12:29:04,122 - julearn - DEBUG - Target generated: pca__pca0
    2026-03-31 12:29:04,122 - julearn - DEBUG - Fitting model from generated target
    /opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
        scores = scorer(estimator, X_test, y_test, **score_params)
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
        return estimator.score(*args, **kwargs)
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1138, in score
        routed_params = process_routing(
            self, "score", sample_weight=sample_weight, **params
        )
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
        request_routing.validate_metadata(params=kwargs, method=_method)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
        raise TypeError(
        ...<2 lines>...
        )
    TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

      warnings.warn(
    2026-03-31 12:29:04,131 - julearn - DEBUG - Setting column types for Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='str')
    2026-03-31 12:29:04,131 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,134 - julearn - DEBUG - Fitting the target generator
    2026-03-31 12:29:04,135 - julearn - DEBUG - Setting column types for Index(['sepal_length__:type:__sepal', 'sepal_width__:type:__sepal',
           'petal_length__:type:__petal', 'petal_width__:type:__petal'],
          dtype='str')
    2026-03-31 12:29:04,135 - julearn - DEBUG -     Column mappers for {'sepal_length': 'sepal_length__:type:__sepal', 'sepal_width': 'sepal_width__:type:__sepal', 'petal_length': 'petal_length__:type:__petal', 'petal_width': 'petal_width__:type:__petal'}
    2026-03-31 12:29:04,143 - julearn - DEBUG - Generating target
    2026-03-31 12:29:04,147 - julearn - DEBUG - Picking columns: ['pca__pca0']
    2026-03-31 12:29:04,147 - julearn - DEBUG - Target generated: pca__pca0
    2026-03-31 12:29:04,148 - julearn - DEBUG - Fitting model from generated target
    /opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py:927: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/model_selection/_validation.py", line 916, in _score
        scores = scorer(estimator, X_test, y_test, **score_params)
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/metrics/_scorer.py", line 485, in __call__
        return estimator.score(*args, **kwargs)
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/pipeline.py", line 1138, in score
        routed_params = process_routing(
            self, "score", sample_weight=sample_weight, **params
        )
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1643, in process_routing
        request_routing.validate_metadata(params=kwargs, method=_method)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/sklearn/utils/_metadata_requests.py", line 1139, in validate_metadata
        raise TypeError(
        ...<2 lines>...
        )
    TypeError: Pipeline.score got unexpected argument(s) {'sample_weight'}, which are not routed to any object.

      warnings.warn(
    0   NaN
    1   NaN
    Name: test_score, dtype: float64


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.147 seconds)


.. _sphx_glr_download_auto_examples_03_complex_models_run_generate_target.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: run_generate_target.ipynb <run_generate_target.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: run_generate_target.py <run_generate_target.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: run_generate_target.zip <run_generate_target.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_