.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/03_complex_models/run_apply_to_target.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_03_complex_models_run_apply_to_target.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_03_complex_models_run_apply_to_target.py:


Transforming target variable with z-score
=========================================

This example uses the sklearn ``diabetes`` regression dataset, and transforms the
target variable, in this case, using z-score. Then, we perform a regression
analysis using Ridge Regression model.

.. GENERATED FROM PYTHON SOURCE LINES 10-24

.. code-block:: Python

    # Authors: Lya K. Paas Oliveros <l.paas.oliveros@fz-juelich.de>
    #          Sami Hamdan <s.hamdan@fz-juelich.de>
    #
    # License: AGPL

    import pandas as pd
    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import train_test_split

    from julearn import run_cross_validation
    from julearn.utils import configure_logging

    from julearn.pipeline import PipelineCreator, TargetPipelineCreator


.. GENERATED FROM PYTHON SOURCE LINES 25-26

Set the logging level to info to see extra information.

.. GENERATED FROM PYTHON SOURCE LINES 26-28

.. code-block:: Python

    configure_logging(level="INFO")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:54:05,489 - julearn - INFO - ===== Lib Versions =====
    2026-01-16 10:54:05,489 - julearn - INFO - numpy: 1.26.4
    2026-01-16 10:54:05,489 - julearn - INFO - scipy: 1.17.0
    2026-01-16 10:54:05,489 - julearn - INFO - sklearn: 1.7.2
    2026-01-16 10:54:05,489 - julearn - INFO - pandas: 2.3.3
    2026-01-16 10:54:05,489 - julearn - INFO - julearn: 0.3.5.dev123
    2026-01-16 10:54:05,489 - julearn - INFO - ========================


.. GENERATED FROM PYTHON SOURCE LINES 29-30

Load the diabetes dataset from ``sklearn`` as a ``pandas.DataFrame``.

.. GENERATED FROM PYTHON SOURCE LINES 30-32

.. code-block:: Python

    features, target = load_diabetes(return_X_y=True, as_frame=True)


.. GENERATED FROM PYTHON SOURCE LINES 33-37

Dataset contains ten variables age, sex, body mass index, average  blood
pressure, and six blood serum measurements (s1-s6) diabetes patients and
a quantitative measure of disease progression one year after baseline which
is the target we are interested in predicting.

.. GENERATED FROM PYTHON SOURCE LINES 37-40

.. code-block:: Python

    print("Features: \n", features.head())
    print("Target: \n", target.describe())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Features: 
             age       sex       bmi  ...        s4        s5        s6
    0  0.038076  0.050680  0.061696  ... -0.002592  0.019907 -0.017646
    1 -0.001882 -0.044642 -0.051474  ... -0.039493 -0.068332 -0.092204
    2  0.085299  0.050680  0.044451  ... -0.002592  0.002861 -0.025930
    3 -0.089063 -0.044642 -0.011595  ...  0.034309  0.022688 -0.009362
    4  0.005383 -0.044642 -0.036385  ... -0.002592 -0.031988 -0.046641

    [5 rows x 10 columns]
    Target: 
     count    442.000000
    mean     152.133484
    std       77.093005
    min       25.000000
    25%       87.000000
    50%      140.500000
    75%      211.500000
    max      346.000000
    Name: target, dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 41-43

Let's combine features and target together in one dataframe and define X
and y.

.. GENERATED FROM PYTHON SOURCE LINES 43-48

.. code-block:: Python

    data_diabetes = pd.concat([features, target], axis=1)

    X = ["age", "sex", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"]
    y = "target"


.. GENERATED FROM PYTHON SOURCE LINES 49-50

Split the dataset into train and test.

.. GENERATED FROM PYTHON SOURCE LINES 50-52

.. code-block:: Python

    train_diabetes, test_diabetes = train_test_split(data_diabetes, test_size=0.3)


.. GENERATED FROM PYTHON SOURCE LINES 53-55

Let's create the model. Since we will be transforming the target variable
we will first need to create a TargetPipelineCreator for this.

.. GENERATED FROM PYTHON SOURCE LINES 55-59

.. code-block:: Python


    target_creator = TargetPipelineCreator()
    target_creator.add("zscore")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <julearn.pipeline.target_pipeline_creator.TargetPipelineCreator object at 0x7f2d82f0a120>


.. GENERATED FROM PYTHON SOURCE LINES 60-61

Now we can create the pipeline using a PipelineCreator.

.. GENERATED FROM PYTHON SOURCE LINES 61-76

.. code-block:: Python

    creator = PipelineCreator(problem_type="regression")
    creator.add(target_creator, apply_to="target")
    creator.add("ridge")

    scores, model = run_cross_validation(
        X=X,
        y=y,
        data=train_diabetes,
        model=creator,
        return_estimator="final",
        scoring="neg_mean_absolute_error",
    )

    print(scores.head(5))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:54:05,504 - julearn - INFO - Adding step jutargetpipeline that applies to ColumnTypes<types={'target'}; pattern=(?:target)>
    2026-01-16 10:54:05,504 - julearn - INFO - Step added
    2026-01-16 10:54:05,504 - julearn - INFO - Adding step ridge that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:54:05,505 - julearn - INFO - Step added
    2026-01-16 10:54:05,505 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:54:05,505 - julearn - INFO - Using dataframe as input
    2026-01-16 10:54:05,505 - julearn - INFO -      Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
    2026-01-16 10:54:05,505 - julearn - INFO -      Target: target
    2026-01-16 10:54:05,505 - julearn - INFO -      Expanded features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
    2026-01-16 10:54:05,505 - julearn - INFO -      X_types:{}
    2026-01-16 10:54:05,505 - julearn - WARNING - The following columns are not defined in X_types: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:54:05,506 - julearn - INFO - ====================
    2026-01-16 10:54:05,506 - julearn - INFO - 
    2026-01-16 10:54:05,507 - julearn - INFO - = Model Parameters =
    2026-01-16 10:54:05,507 - julearn - INFO - ====================
    2026-01-16 10:54:05,507 - julearn - INFO - 
    2026-01-16 10:54:05,507 - julearn - INFO - = Data Information =
    2026-01-16 10:54:05,507 - julearn - INFO -      Problem type: regression
    2026-01-16 10:54:05,507 - julearn - INFO -      Number of samples: 309
    2026-01-16 10:54:05,507 - julearn - INFO -      Number of features: 10
    2026-01-16 10:54:05,507 - julearn - INFO - ====================
    2026-01-16 10:54:05,507 - julearn - INFO - 
    2026-01-16 10:54:05,508 - julearn - INFO -      Target type: float64
    2026-01-16 10:54:05,508 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False) (incl. final model)
       fit_time  score_time  ...  fold                          cv_mdsum
    0  0.003549    0.001742  ...     0  b10eef89b4192178d482d7a1587a248a
    1  0.003499    0.001737  ...     1  b10eef89b4192178d482d7a1587a248a
    2  0.003555    0.001725  ...     2  b10eef89b4192178d482d7a1587a248a
    3  0.003531    0.001703  ...     3  b10eef89b4192178d482d7a1587a248a
    4  0.003485    0.001696  ...     4  b10eef89b4192178d482d7a1587a248a

    [5 rows x 8 columns]


.. GENERATED FROM PYTHON SOURCE LINES 77-78

Mean value of mean absolute error across CV

.. GENERATED FROM PYTHON SOURCE LINES 78-79

.. code-block:: Python

    print(scores["test_score"].mean() * -1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    51.51357151914368


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.067 seconds)


.. _sphx_glr_download_auto_examples_03_complex_models_run_apply_to_target.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: run_apply_to_target.ipynb <run_apply_to_target.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: run_apply_to_target.py <run_apply_to_target.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: run_apply_to_target.zip <run_apply_to_target.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_