.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/00_starting/run_simple_binary_classification.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_00_starting_run_simple_binary_classification.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_00_starting_run_simple_binary_classification.py:


Simple Binary Classification
============================

This example uses the ``iris`` dataset and performs a simple binary
classification using a Support Vector Machine classifier.

.. include:: ../../links.inc

.. GENERATED FROM PYTHON SOURCE LINES 10-18

.. code-block:: Python

    # Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
    #
    # License: AGPL

    from seaborn import load_dataset
    from julearn import run_cross_validation
    from julearn.utils import configure_logging


.. GENERATED FROM PYTHON SOURCE LINES 19-20

Set the logging level to info to see extra information

.. GENERATED FROM PYTHON SOURCE LINES 20-22

.. code-block:: Python

    configure_logging(level="INFO")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:53:51,862 - julearn - INFO - ===== Lib Versions =====
    2026-01-16 10:53:51,862 - julearn - INFO - numpy: 1.26.4
    2026-01-16 10:53:51,862 - julearn - INFO - scipy: 1.17.0
    2026-01-16 10:53:51,862 - julearn - INFO - sklearn: 1.7.2
    2026-01-16 10:53:51,862 - julearn - INFO - pandas: 2.3.3
    2026-01-16 10:53:51,862 - julearn - INFO - julearn: 0.3.5.dev123
    2026-01-16 10:53:51,862 - julearn - INFO - ========================


.. GENERATED FROM PYTHON SOURCE LINES 23-25

.. code-block:: Python

    df_iris = load_dataset("iris")


.. GENERATED FROM PYTHON SOURCE LINES 26-28

The dataset has three kind of species. We will keep two to perform a binary
classification.

.. GENERATED FROM PYTHON SOURCE LINES 28-30

.. code-block:: Python

    df_iris = df_iris[df_iris["species"].isin(["versicolor", "virginica"])]


.. GENERATED FROM PYTHON SOURCE LINES 31-33

As features, we will use the sepal length, width and petal length.
We will try to predict the species.

.. GENERATED FROM PYTHON SOURCE LINES 33-47

.. code-block:: Python


    X = ["sepal_length", "sepal_width", "petal_length"]
    y = "species"
    scores = run_cross_validation(
        X=X,
        y=y,
        data=df_iris,
        model="svm",
        problem_type="classification",
        preprocess="zscore",
    )

    print(scores["test_score"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:53:51,937 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:53:51,938 - julearn - INFO - Using dataframe as input
    2026-01-16 10:53:51,938 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:51,938 - julearn - INFO -      Target: species
    2026-01-16 10:53:51,938 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:51,938 - julearn - INFO -      X_types:{}
    2026-01-16 10:53:51,938 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:53:51,939 - julearn - INFO - ====================
    2026-01-16 10:53:51,940 - julearn - INFO - 
    2026-01-16 10:53:51,940 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:51,940 - julearn - INFO - Step added
    2026-01-16 10:53:51,940 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:51,940 - julearn - INFO - Step added
    2026-01-16 10:53:51,941 - julearn - INFO - = Model Parameters =
    2026-01-16 10:53:51,941 - julearn - INFO - ====================
    2026-01-16 10:53:51,942 - julearn - INFO - 
    2026-01-16 10:53:51,942 - julearn - INFO - = Data Information =
    2026-01-16 10:53:51,942 - julearn - INFO -      Problem type: classification
    2026-01-16 10:53:51,942 - julearn - INFO -      Number of samples: 100
    2026-01-16 10:53:51,942 - julearn - INFO -      Number of features: 3
    2026-01-16 10:53:51,942 - julearn - INFO - ====================
    2026-01-16 10:53:51,942 - julearn - INFO - 
    2026-01-16 10:53:51,942 - julearn - INFO -      Number of classes: 2
    2026-01-16 10:53:51,942 - julearn - INFO -      Target type: object
    2026-01-16 10:53:51,943 - julearn - INFO -      Class distributions: species
    versicolor    50
    virginica     50
    Name: count, dtype: int64
    2026-01-16 10:53:51,943 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
    2026-01-16 10:53:51,944 - julearn - INFO - Binary classification problem detected.
    0    0.90
    1    0.75
    2    0.95
    3    0.70
    4    0.90
    Name: test_score, dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 48-52

Additionally, we can choose to assess the performance of the model using
different scoring functions.

For example, we might have an unbalanced dataset:

.. GENERATED FROM PYTHON SOURCE LINES 52-56

.. code-block:: Python


    df_unbalanced = df_iris[20:]  # drop the first 20 versicolor samples
    print(df_unbalanced["species"].value_counts())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    species
    virginica     50
    versicolor    30
    Name: count, dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 57-62

If we compute the `accuracy`, we might not account for this imbalance. A more
suitable metric is the `balanced_accuracy`. More information in
``scikit-learn``: :func:`~sklearn.metrics.balanced_accuracy_score`.

We will also set the random seed so we always split the data in the same way.

.. GENERATED FROM PYTHON SOURCE LINES 62-76

.. code-block:: Python

    scores = run_cross_validation(
        X=X,
        y=y,
        data=df_unbalanced,
        model="svm",
        seed=42,
        preprocess="zscore",
        problem_type="classification",
        scoring=["accuracy", "balanced_accuracy"],
    )

    print(scores["test_accuracy"].mean())
    print(scores["test_balanced_accuracy"].mean())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:53:51,996 - julearn - INFO - Setting random seed to 42
    2026-01-16 10:53:51,996 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:53:51,996 - julearn - INFO - Using dataframe as input
    2026-01-16 10:53:51,996 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:51,996 - julearn - INFO -      Target: species
    2026-01-16 10:53:51,996 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:51,996 - julearn - INFO -      X_types:{}
    2026-01-16 10:53:51,996 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:53:51,997 - julearn - INFO - ====================
    2026-01-16 10:53:51,997 - julearn - INFO - 
    2026-01-16 10:53:51,997 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:51,998 - julearn - INFO - Step added
    2026-01-16 10:53:51,998 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:51,998 - julearn - INFO - Step added
    2026-01-16 10:53:51,998 - julearn - INFO - = Model Parameters =
    2026-01-16 10:53:51,998 - julearn - INFO - ====================
    2026-01-16 10:53:51,999 - julearn - INFO - 
    2026-01-16 10:53:51,999 - julearn - INFO - = Data Information =
    2026-01-16 10:53:51,999 - julearn - INFO -      Problem type: classification
    2026-01-16 10:53:51,999 - julearn - INFO -      Number of samples: 80
    2026-01-16 10:53:51,999 - julearn - INFO -      Number of features: 3
    2026-01-16 10:53:51,999 - julearn - INFO - ====================
    2026-01-16 10:53:51,999 - julearn - INFO - 
    2026-01-16 10:53:51,999 - julearn - INFO -      Number of classes: 2
    2026-01-16 10:53:51,999 - julearn - INFO -      Target type: object
    2026-01-16 10:53:52,000 - julearn - INFO -      Class distributions: species
    virginica     50
    versicolor    30
    Name: count, dtype: int64
    2026-01-16 10:53:52,000 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
    2026-01-16 10:53:52,000 - julearn - INFO - Binary classification problem detected.
    /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true
      warnings.warn("y_pred contains classes not in y_true")
    /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true
      warnings.warn("y_pred contains classes not in y_true")
    /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true
      warnings.warn("y_pred contains classes not in y_true")
    /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true
      warnings.warn("y_pred contains classes not in y_true")
    0.8625
    0.8678571428571429


.. GENERATED FROM PYTHON SOURCE LINES 77-87

Other kind of metrics allows us to evaluate how good our model is to detect
specific targets. Suppose we want to create a model that correctly identifies
the `versicolor` samples.

Now we might want to evaluate the precision score, or the ratio of true
positives (tp) over all positives (true and false positives). More
information in ``scikit-learn``: :func:`~sklearn.metrics.precision_score`.

For this metric to work, we need to define which are our `positive` values.
In this example, we are interested in detecting `versicolor`.

.. GENERATED FROM PYTHON SOURCE LINES 87-99

.. code-block:: Python

    precision_scores = run_cross_validation(
        X=X,
        y=y,
        data=df_unbalanced,
        model="svm",
        preprocess="zscore",
        problem_type="classification",
        seed=42,
        scoring="precision",
        pos_labels="versicolor",
    )
    print(precision_scores["test_score"].mean())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2026-01-16 10:53:52,051 - julearn - INFO - Setting random seed to 42
    2026-01-16 10:53:52,051 - julearn - INFO - ==== Input Data ====
    2026-01-16 10:53:52,052 - julearn - INFO - Using dataframe as input
    2026-01-16 10:53:52,052 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:52,052 - julearn - INFO -      Target: species
    2026-01-16 10:53:52,052 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length']
    2026-01-16 10:53:52,052 - julearn - INFO -      X_types:{}
    2026-01-16 10:53:52,052 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
    /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous.
      warn_with_log(
    2026-01-16 10:53:52,053 - julearn - INFO - Setting the following as positive labels ['versicolor']
    2026-01-16 10:53:52,053 - julearn - INFO - ====================
    2026-01-16 10:53:52,053 - julearn - INFO - 
    2026-01-16 10:53:52,053 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:52,054 - julearn - INFO - Step added
    2026-01-16 10:53:52,054 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'continuous'}; pattern=(?:__:type:__continuous)>
    2026-01-16 10:53:52,054 - julearn - INFO - Step added
    2026-01-16 10:53:52,054 - julearn - INFO - = Model Parameters =
    2026-01-16 10:53:52,054 - julearn - INFO - ====================
    2026-01-16 10:53:52,055 - julearn - INFO - 
    2026-01-16 10:53:52,055 - julearn - INFO - = Data Information =
    2026-01-16 10:53:52,055 - julearn - INFO -      Problem type: classification
    2026-01-16 10:53:52,055 - julearn - INFO -      Number of samples: 80
    2026-01-16 10:53:52,055 - julearn - INFO -      Number of features: 3
    2026-01-16 10:53:52,055 - julearn - INFO - ====================
    2026-01-16 10:53:52,055 - julearn - INFO - 
    2026-01-16 10:53:52,055 - julearn - INFO -      Number of classes: 2
    2026-01-16 10:53:52,055 - julearn - INFO -      Target type: int64
    2026-01-16 10:53:52,056 - julearn - INFO -      Class distributions: species
    0    50
    1    30
    Name: count, dtype: int64
    2026-01-16 10:53:52,056 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
    2026-01-16 10:53:52,056 - julearn - INFO - Binary classification problem detected.
    0.4


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.249 seconds)


.. _sphx_glr_download_auto_examples_00_starting_run_simple_binary_classification.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: run_simple_binary_classification.ipynb <run_simple_binary_classification.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: run_simple_binary_classification.py <run_simple_binary_classification.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: run_simple_binary_classification.zip <run_simple_binary_classification.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_