.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/03_complex_models/run_apply_to_target.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_03_complex_models_run_apply_to_target.py: Transforming target variable with z-score ========================================= This example uses the sklearn ``diabetes`` regression dataset, and transforms the target variable, in this case, using z-score. Then, we perform a regression analysis using Ridge Regression model. .. GENERATED FROM PYTHON SOURCE LINES 10-24 .. code-block:: Python # Authors: Lya K. Paas Oliveros # Sami Hamdan # # License: AGPL import pandas as pd from sklearn.datasets import load_diabetes from sklearn.model_selection import train_test_split from julearn import run_cross_validation from julearn.utils import configure_logging from julearn.pipeline import PipelineCreator, TargetPipelineCreator .. GENERATED FROM PYTHON SOURCE LINES 25-26 Set the logging level to info to see extra information. .. GENERATED FROM PYTHON SOURCE LINES 26-28 .. code-block:: Python configure_logging(level="INFO") .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:54:05,489 - julearn - INFO - ===== Lib Versions ===== 2026-01-16 10:54:05,489 - julearn - INFO - numpy: 1.26.4 2026-01-16 10:54:05,489 - julearn - INFO - scipy: 1.17.0 2026-01-16 10:54:05,489 - julearn - INFO - sklearn: 1.7.2 2026-01-16 10:54:05,489 - julearn - INFO - pandas: 2.3.3 2026-01-16 10:54:05,489 - julearn - INFO - julearn: 0.3.5.dev123 2026-01-16 10:54:05,489 - julearn - INFO - ======================== .. GENERATED FROM PYTHON SOURCE LINES 29-30 Load the diabetes dataset from ``sklearn`` as a ``pandas.DataFrame``. .. GENERATED FROM PYTHON SOURCE LINES 30-32 .. code-block:: Python features, target = load_diabetes(return_X_y=True, as_frame=True) .. GENERATED FROM PYTHON SOURCE LINES 33-37 Dataset contains ten variables age, sex, body mass index, average blood pressure, and six blood serum measurements (s1-s6) diabetes patients and a quantitative measure of disease progression one year after baseline which is the target we are interested in predicting. .. GENERATED FROM PYTHON SOURCE LINES 37-40 .. code-block:: Python print("Features: \n", features.head()) print("Target: \n", target.describe()) .. rst-class:: sphx-glr-script-out .. code-block:: none Features: age sex bmi ... s4 s5 s6 0 0.038076 0.050680 0.061696 ... -0.002592 0.019907 -0.017646 1 -0.001882 -0.044642 -0.051474 ... -0.039493 -0.068332 -0.092204 2 0.085299 0.050680 0.044451 ... -0.002592 0.002861 -0.025930 3 -0.089063 -0.044642 -0.011595 ... 0.034309 0.022688 -0.009362 4 0.005383 -0.044642 -0.036385 ... -0.002592 -0.031988 -0.046641 [5 rows x 10 columns] Target: count 442.000000 mean 152.133484 std 77.093005 min 25.000000 25% 87.000000 50% 140.500000 75% 211.500000 max 346.000000 Name: target, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 41-43 Let's combine features and target together in one dataframe and define X and y. .. GENERATED FROM PYTHON SOURCE LINES 43-48 .. code-block:: Python data_diabetes = pd.concat([features, target], axis=1) X = ["age", "sex", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"] y = "target" .. GENERATED FROM PYTHON SOURCE LINES 49-50 Split the dataset into train and test. .. GENERATED FROM PYTHON SOURCE LINES 50-52 .. code-block:: Python train_diabetes, test_diabetes = train_test_split(data_diabetes, test_size=0.3) .. GENERATED FROM PYTHON SOURCE LINES 53-55 Let's create the model. Since we will be transforming the target variable we will first need to create a TargetPipelineCreator for this. .. GENERATED FROM PYTHON SOURCE LINES 55-59 .. code-block:: Python target_creator = TargetPipelineCreator() target_creator.add("zscore") .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 60-61 Now we can create the pipeline using a PipelineCreator. .. GENERATED FROM PYTHON SOURCE LINES 61-76 .. code-block:: Python creator = PipelineCreator(problem_type="regression") creator.add(target_creator, apply_to="target") creator.add("ridge") scores, model = run_cross_validation( X=X, y=y, data=train_diabetes, model=creator, return_estimator="final", scoring="neg_mean_absolute_error", ) print(scores.head(5)) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:54:05,504 - julearn - INFO - Adding step jutargetpipeline that applies to ColumnTypes 2026-01-16 10:54:05,504 - julearn - INFO - Step added 2026-01-16 10:54:05,504 - julearn - INFO - Adding step ridge that applies to ColumnTypes 2026-01-16 10:54:05,505 - julearn - INFO - Step added 2026-01-16 10:54:05,505 - julearn - INFO - ==== Input Data ==== 2026-01-16 10:54:05,505 - julearn - INFO - Using dataframe as input 2026-01-16 10:54:05,505 - julearn - INFO - Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2026-01-16 10:54:05,505 - julearn - INFO - Target: target 2026-01-16 10:54:05,505 - julearn - INFO - Expanded features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2026-01-16 10:54:05,505 - julearn - INFO - X_types:{} 2026-01-16 10:54:05,505 - julearn - WARNING - The following columns are not defined in X_types: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']. They will be treated as continuous. /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']. They will be treated as continuous. warn_with_log( 2026-01-16 10:54:05,506 - julearn - INFO - ==================== 2026-01-16 10:54:05,506 - julearn - INFO - 2026-01-16 10:54:05,507 - julearn - INFO - = Model Parameters = 2026-01-16 10:54:05,507 - julearn - INFO - ==================== 2026-01-16 10:54:05,507 - julearn - INFO - 2026-01-16 10:54:05,507 - julearn - INFO - = Data Information = 2026-01-16 10:54:05,507 - julearn - INFO - Problem type: regression 2026-01-16 10:54:05,507 - julearn - INFO - Number of samples: 309 2026-01-16 10:54:05,507 - julearn - INFO - Number of features: 10 2026-01-16 10:54:05,507 - julearn - INFO - ==================== 2026-01-16 10:54:05,507 - julearn - INFO - 2026-01-16 10:54:05,508 - julearn - INFO - Target type: float64 2026-01-16 10:54:05,508 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False) (incl. final model) fit_time score_time ... fold cv_mdsum 0 0.003549 0.001742 ... 0 b10eef89b4192178d482d7a1587a248a 1 0.003499 0.001737 ... 1 b10eef89b4192178d482d7a1587a248a 2 0.003555 0.001725 ... 2 b10eef89b4192178d482d7a1587a248a 3 0.003531 0.001703 ... 3 b10eef89b4192178d482d7a1587a248a 4 0.003485 0.001696 ... 4 b10eef89b4192178d482d7a1587a248a [5 rows x 8 columns] .. GENERATED FROM PYTHON SOURCE LINES 77-78 Mean value of mean absolute error across CV .. GENERATED FROM PYTHON SOURCE LINES 78-79 .. code-block:: Python print(scores["test_score"].mean() * -1) .. rst-class:: sphx-glr-script-out .. code-block:: none 51.51357151914368 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.067 seconds) .. _sphx_glr_download_auto_examples_03_complex_models_run_apply_to_target.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run_apply_to_target.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run_apply_to_target.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: run_apply_to_target.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_