.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/advanced/run_custom_scorers_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_advanced_run_custom_scorers_regression.py: Custom Scoring Function for Regression ====================================== This example uses the 'diabetes' data from sklearn datasets and performs a regression analysis using a Ridge Regression model. As scorers, it uses scikit-learn, julearn and a custom metric defined by the user. .. GENERATED FROM PYTHON SOURCE LINES 10-25 .. code-block:: default # Authors: Shammi More # Federico Raimondo # # License: AGPL import pandas as pd import scipy from sklearn.datasets import load_diabetes from sklearn.metrics import make_scorer from julearn.scoring import register_scorer from julearn import run_cross_validation from julearn.utils import configure_logging .. GENERATED FROM PYTHON SOURCE LINES 26-27 Set the logging level to info to see extra information .. GENERATED FROM PYTHON SOURCE LINES 27-29 .. code-block:: default configure_logging(level='INFO') .. rst-class:: sphx-glr-script-out .. code-block:: none 2022-12-08 10:45:54,755 - julearn - INFO - ===== Lib Versions ===== 2022-12-08 10:45:54,755 - julearn - INFO - numpy: 1.23.5 2022-12-08 10:45:54,755 - julearn - INFO - scipy: 1.9.3 2022-12-08 10:45:54,755 - julearn - INFO - sklearn: 1.0.2 2022-12-08 10:45:54,755 - julearn - INFO - pandas: 1.4.4 2022-12-08 10:45:54,755 - julearn - INFO - julearn: 0.2.7 2022-12-08 10:45:54,755 - julearn - INFO - ======================== .. GENERATED FROM PYTHON SOURCE LINES 30-31 load the diabetes data from sklearn as a pandas dataframe .. GENERATED FROM PYTHON SOURCE LINES 31-33 .. code-block:: default features, target = load_diabetes(return_X_y=True, as_frame=True) .. GENERATED FROM PYTHON SOURCE LINES 34-38 Dataset contains ten variables age, sex, body mass index, average blood pressure, and six blood serum measurements (s1-s6) diabetes patients and a quantitative measure of disease progression one year after baseline which is the target we are interested in predicting. .. GENERATED FROM PYTHON SOURCE LINES 38-42 .. code-block:: default print('Features: \n', features.head()) # type: ignore print('Target: \n', target.describe()) # type: ignore .. rst-class:: sphx-glr-script-out .. code-block:: none Features: age sex bmi ... s4 s5 s6 0 0.038076 0.050680 0.061696 ... -0.002592 0.019908 -0.017646 1 -0.001882 -0.044642 -0.051474 ... -0.039493 -0.068330 -0.092204 2 0.085299 0.050680 0.044451 ... -0.002592 0.002864 -0.025930 3 -0.089063 -0.044642 -0.011595 ... 0.034309 0.022692 -0.009362 4 0.005383 -0.044642 -0.036385 ... -0.002592 -0.031991 -0.046641 [5 rows x 10 columns] Target: count 442.000000 mean 152.133484 std 77.093005 min 25.000000 25% 87.000000 50% 140.500000 75% 211.500000 max 346.000000 Name: target, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 43-45 Let's combine features and target together in one dataframe and define X and y .. GENERATED FROM PYTHON SOURCE LINES 45-51 .. code-block:: default data_diabetes = pd.concat([features, target], axis=1) # type: ignore X = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] y = 'target' .. GENERATED FROM PYTHON SOURCE LINES 52-54 Train a ridge regression model on train dataset and use mean absolute error for scoring .. GENERATED FROM PYTHON SOURCE LINES 54-59 .. code-block:: default scores, model = run_cross_validation( X=X, y=y, data=data_diabetes, preprocess_X='zscore', problem_type='regression', model='ridge', return_estimator='final', scoring='neg_mean_absolute_error') .. rst-class:: sphx-glr-script-out .. code-block:: none 2022-12-08 10:45:54,773 - julearn - INFO - Using default CV 2022-12-08 10:45:54,774 - julearn - INFO - ==== Input Data ==== 2022-12-08 10:45:54,774 - julearn - INFO - Using dataframe as input 2022-12-08 10:45:54,774 - julearn - INFO - Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:54,774 - julearn - INFO - Target: target 2022-12-08 10:45:54,774 - julearn - INFO - Expanded X: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:54,774 - julearn - INFO - Expanded Confounds: [] 2022-12-08 10:45:54,775 - julearn - INFO - ==================== 2022-12-08 10:45:54,775 - julearn - INFO - 2022-12-08 10:45:54,775 - julearn - INFO - ====== Model ====== 2022-12-08 10:45:54,775 - julearn - INFO - Obtaining model by name: ridge 2022-12-08 10:45:54,775 - julearn - INFO - =================== 2022-12-08 10:45:54,775 - julearn - INFO - 2022-12-08 10:45:54,775 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds .. GENERATED FROM PYTHON SOURCE LINES 60-61 The scores dataframe has all the values for each CV split. .. GENERATED FROM PYTHON SOURCE LINES 61-64 .. code-block:: default print(scores.head()) .. rst-class:: sphx-glr-script-out .. code-block:: none fit_time score_time test_score repeat fold 0 0.008654 0.005499 -44.386924 0 0 1 0.008176 0.005423 -45.094063 0 1 2 0.008156 0.005428 -43.188016 0 2 3 0.008114 0.005428 -41.591935 0 3 4 0.008098 0.005366 -49.226121 0 4 .. GENERATED FROM PYTHON SOURCE LINES 65-66 Mean value of mean absolute error across CV .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: default print(scores['test_score'].mean() * -1) # type: ignore .. rst-class:: sphx-glr-script-out .. code-block:: none 44.56947555620945 .. GENERATED FROM PYTHON SOURCE LINES 69-71 Now do the same thing, but use mean absolute error and Pearson product-moment correlation coefficient (squared) as scoring functions .. GENERATED FROM PYTHON SOURCE LINES 71-76 .. code-block:: default scores, model = run_cross_validation( X=X, y=y, data=data_diabetes, preprocess_X='zscore', problem_type='regression', model='ridge', return_estimator='final', scoring=['neg_mean_absolute_error', 'r2_corr']) .. rst-class:: sphx-glr-script-out .. code-block:: none 2022-12-08 10:45:55,149 - julearn - INFO - Using default CV 2022-12-08 10:45:55,149 - julearn - INFO - ==== Input Data ==== 2022-12-08 10:45:55,149 - julearn - INFO - Using dataframe as input 2022-12-08 10:45:55,149 - julearn - INFO - Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:55,149 - julearn - INFO - Target: target 2022-12-08 10:45:55,149 - julearn - INFO - Expanded X: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:55,150 - julearn - INFO - Expanded Confounds: [] 2022-12-08 10:45:55,150 - julearn - INFO - ==================== 2022-12-08 10:45:55,150 - julearn - INFO - 2022-12-08 10:45:55,150 - julearn - INFO - ====== Model ====== 2022-12-08 10:45:55,150 - julearn - INFO - Obtaining model by name: ridge 2022-12-08 10:45:55,150 - julearn - INFO - =================== 2022-12-08 10:45:55,150 - julearn - INFO - 2022-12-08 10:45:55,151 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds .. GENERATED FROM PYTHON SOURCE LINES 77-80 Now the scores dataframe has all the values for each CV split, but two scores unders the column names 'test_neg_mean_absolute_error' and 'test_r2_corr'. .. GENERATED FROM PYTHON SOURCE LINES 80-83 .. code-block:: default print(scores[['test_neg_mean_absolute_error', 'test_r2_corr']].mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none test_neg_mean_absolute_error -44.257386 test_r2_corr 0.502022 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 84-87 If we want to define a custom scoring metric, we need to define a function that takes the predicted and the actual values as input and returns a value. In this case, we want to compute Pearson correlation coefficient (r). .. GENERATED FROM PYTHON SOURCE LINES 87-93 .. code-block:: default def pearson_scorer(y_true, y_pred): return scipy.stats.pearsonr( # type: ignore y_true.squeeze(), y_pred.squeeze())[0] .. GENERATED FROM PYTHON SOURCE LINES 94-96 Before using it, we need to convert it to a sklearn scorer and register it with julearn. .. GENERATED FROM PYTHON SOURCE LINES 96-98 .. code-block:: default register_scorer(scorer_name='pearsonr', scorer=make_scorer(pearson_scorer)) .. rst-class:: sphx-glr-script-out .. code-block:: none 2022-12-08 10:45:55,656 - julearn - INFO - registering scorer named pearsonr .. GENERATED FROM PYTHON SOURCE LINES 99-100 Now we can use it as another scoring metric. .. GENERATED FROM PYTHON SOURCE LINES 100-104 .. code-block:: default scores, model = run_cross_validation( X=X, y=y, data=data_diabetes, preprocess_X='zscore', problem_type='regression', model='ridge', return_estimator='final', scoring=['neg_mean_absolute_error', 'r2_corr', 'pearsonr']) .. rst-class:: sphx-glr-script-out .. code-block:: none 2022-12-08 10:45:55,656 - julearn - INFO - Using default CV 2022-12-08 10:45:55,656 - julearn - INFO - ==== Input Data ==== 2022-12-08 10:45:55,656 - julearn - INFO - Using dataframe as input 2022-12-08 10:45:55,657 - julearn - INFO - Features: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:55,657 - julearn - INFO - Target: target 2022-12-08 10:45:55,657 - julearn - INFO - Expanded X: ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] 2022-12-08 10:45:55,657 - julearn - INFO - Expanded Confounds: [] 2022-12-08 10:45:55,658 - julearn - INFO - ==================== 2022-12-08 10:45:55,658 - julearn - INFO - 2022-12-08 10:45:55,658 - julearn - INFO - ====== Model ====== 2022-12-08 10:45:55,658 - julearn - INFO - Obtaining model by name: ridge 2022-12-08 10:45:55,658 - julearn - INFO - =================== 2022-12-08 10:45:55,658 - julearn - INFO - 2022-12-08 10:45:55,658 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.537 seconds) .. _sphx_glr_download_auto_examples_advanced_run_custom_scorers_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run_custom_scorers_regression.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run_custom_scorers_regression.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_