.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/basic/run_simple_binary_classification.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_basic_run_simple_binary_classification.py: Simple Binary Classification ============================ This example uses the 'iris' dataset and performs a simple binary classification using a Support Vector Machine classifier. .. include:: ../../links.inc .. GENERATED FROM PYTHON SOURCE LINES 10-17 .. code-block:: default # Authors: Federico Raimondo # # License: AGPL from seaborn import load_dataset from julearn import run_cross_validation from julearn.utils import configure_logging .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'rocket' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'rocket_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'mako' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'mako_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'icefire' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'icefire_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'vlag' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'vlag_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'flare' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'flare_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1582: UserWarning: Trying to register the cmap 'crest' which already exists. mpl_cm.register_cmap(_name, _cmap) /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/cm.py:1583: UserWarning: Trying to register the cmap 'crest_r' which already exists. mpl_cm.register_cmap(_name + "_r", _cmap_r) .. GENERATED FROM PYTHON SOURCE LINES 18-19 Set the logging level to info to see extra information .. GENERATED FROM PYTHON SOURCE LINES 19-21 .. code-block:: default configure_logging(level='INFO') .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 2022-07-21 09:54:38,941 - julearn - INFO - ===== Lib Versions ===== 2022-07-21 09:54:38,941 - julearn - INFO - numpy: 1.23.1 2022-07-21 09:54:38,941 - julearn - INFO - scipy: 1.8.1 2022-07-21 09:54:38,941 - julearn - INFO - sklearn: 1.0.2 2022-07-21 09:54:38,941 - julearn - INFO - pandas: 1.4.3 2022-07-21 09:54:38,941 - julearn - INFO - julearn: 0.2.5 2022-07-21 09:54:38,941 - julearn - INFO - ======================== .. GENERATED FROM PYTHON SOURCE LINES 22-24 .. code-block:: default df_iris = load_dataset('iris') .. GENERATED FROM PYTHON SOURCE LINES 25-27 The dataset has three kind of species. We will keep two to perform a binary classification. .. GENERATED FROM PYTHON SOURCE LINES 27-29 .. code-block:: default df_iris = df_iris[df_iris['species'].isin(['versicolor', 'virginica'])] .. GENERATED FROM PYTHON SOURCE LINES 30-32 As features, we will use the sepal length, width and petal length. We will try to predict the species. .. GENERATED FROM PYTHON SOURCE LINES 32-40 .. code-block:: default X = ['sepal_length', 'sepal_width', 'petal_length'] y = 'species' scores = run_cross_validation( X=X, y=y, data=df_iris, model='svm', preprocess_X='zscore') print(scores['test_score']) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 2022-07-21 09:54:38,945 - julearn - INFO - Using default CV 2022-07-21 09:54:38,945 - julearn - INFO - ==== Input Data ==== 2022-07-21 09:54:38,945 - julearn - INFO - Using dataframe as input 2022-07-21 09:54:38,945 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:38,945 - julearn - INFO - Target: species 2022-07-21 09:54:38,945 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:38,945 - julearn - INFO - Expanded Confounds: [] 2022-07-21 09:54:38,946 - julearn - INFO - ==================== 2022-07-21 09:54:38,946 - julearn - INFO - 2022-07-21 09:54:38,946 - julearn - INFO - ====== Model ====== 2022-07-21 09:54:38,946 - julearn - INFO - Obtaining model by name: svm 2022-07-21 09:54:38,946 - julearn - INFO - =================== 2022-07-21 09:54:38,946 - julearn - INFO - 2022-07-21 09:54:38,946 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0 0.90 1 0.95 2 0.90 3 0.80 4 1.00 5 1.00 6 0.95 7 0.90 8 0.90 9 0.80 10 0.90 11 1.00 12 0.95 13 0.80 14 1.00 15 0.90 16 0.95 17 0.95 18 0.95 19 0.90 20 0.95 21 0.95 22 0.80 23 0.95 24 0.95 Name: test_score, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 41-45 Additionally, we can choose to assess the performance of the model using different scoring functions. For example, we might have an unbalanced dataset: .. GENERATED FROM PYTHON SOURCE LINES 45-49 .. code-block:: default df_unbalanced = df_iris[20:] # drop the first 20 versicolor samples print(df_unbalanced['species'].value_counts()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none virginica 50 versicolor 30 Name: species, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 50-55 If we compute the `accuracy`, we might not account for this imbalance. A more suitable metric is the `balanced_accuracy`. More information in scikit-learn: `Balanced Accuracy`_ We will also set the random seed so we always split the data in the same way. .. GENERATED FROM PYTHON SOURCE LINES 55-63 .. code-block:: default scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model='svm', seed=42, preprocess_X='zscore', scoring=['accuracy', 'balanced_accuracy']) print(scores['test_accuracy'].mean()) print(scores['test_balanced_accuracy'].mean()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 2022-07-21 09:54:39,312 - julearn - INFO - Setting random seed to 42 2022-07-21 09:54:39,312 - julearn - INFO - Using default CV 2022-07-21 09:54:39,312 - julearn - INFO - ==== Input Data ==== 2022-07-21 09:54:39,312 - julearn - INFO - Using dataframe as input 2022-07-21 09:54:39,312 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:39,312 - julearn - INFO - Target: species 2022-07-21 09:54:39,312 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:39,312 - julearn - INFO - Expanded Confounds: [] 2022-07-21 09:54:39,313 - julearn - INFO - ==================== 2022-07-21 09:54:39,313 - julearn - INFO - 2022-07-21 09:54:39,313 - julearn - INFO - ====== Model ====== 2022-07-21 09:54:39,313 - julearn - INFO - Obtaining model by name: svm 2022-07-21 09:54:39,313 - julearn - INFO - =================== 2022-07-21 09:54:39,313 - julearn - INFO - 2022-07-21 09:54:39,313 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0.895 0.8708886668886668 .. GENERATED FROM PYTHON SOURCE LINES 64-74 Other kind of metrics allows us to evaluate how good our model is to detect specific targets. Suppose we want to create a model that correctly identifies the `versicolor` samples. Now we might want to evaluate the precision score, or the ratio of true positives (tp) over all positives (true and false positives). More information in scikit-learn: `Precision`_ For this metric to work, we need to define which are our `positive` values. In this example, we are interested in detecting `versicolor`. .. GENERATED FROM PYTHON SOURCE LINES 74-78 .. code-block:: default precision_scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model='svm', preprocess_X='zscore', seed=42, scoring='precision', pos_labels='versicolor') print(precision_scores['test_score'].mean()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 2022-07-21 09:54:39,815 - julearn - INFO - Setting random seed to 42 2022-07-21 09:54:39,815 - julearn - INFO - Using default CV 2022-07-21 09:54:39,815 - julearn - INFO - ==== Input Data ==== 2022-07-21 09:54:39,816 - julearn - INFO - Using dataframe as input 2022-07-21 09:54:39,816 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:39,816 - julearn - INFO - Target: species 2022-07-21 09:54:39,816 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2022-07-21 09:54:39,816 - julearn - INFO - Expanded Confounds: [] 2022-07-21 09:54:39,816 - julearn - INFO - Setting the following as positive labels ['versicolor'] 2022-07-21 09:54:39,817 - julearn - INFO - ==================== 2022-07-21 09:54:39,817 - julearn - INFO - 2022-07-21 09:54:39,817 - julearn - INFO - ====== Model ====== 2022-07-21 09:54:39,817 - julearn - INFO - Obtaining model by name: svm 2022-07-21 09:54:39,817 - julearn - INFO - =================== 2022-07-21 09:54:39,817 - julearn - INFO - 2022-07-21 09:54:39,817 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0.9223333333333333 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.266 seconds) .. _sphx_glr_download_auto_examples_basic_run_simple_binary_classification.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run_simple_binary_classification.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run_simple_binary_classification.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_