.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/00_starting/run_simple_binary_classification.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_00_starting_run_simple_binary_classification.py: Simple Binary Classification ============================ This example uses the ``iris`` dataset and performs a simple binary classification using a Support Vector Machine classifier. .. include:: ../../links.inc .. GENERATED FROM PYTHON SOURCE LINES 10-18 .. code-block:: Python # Authors: Federico Raimondo # # License: AGPL from seaborn import load_dataset from julearn import run_cross_validation from julearn.utils import configure_logging .. GENERATED FROM PYTHON SOURCE LINES 19-20 Set the logging level to info to see extra information .. GENERATED FROM PYTHON SOURCE LINES 20-22 .. code-block:: Python configure_logging(level="INFO") .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:53:51,862 - julearn - INFO - ===== Lib Versions ===== 2026-01-16 10:53:51,862 - julearn - INFO - numpy: 1.26.4 2026-01-16 10:53:51,862 - julearn - INFO - scipy: 1.17.0 2026-01-16 10:53:51,862 - julearn - INFO - sklearn: 1.7.2 2026-01-16 10:53:51,862 - julearn - INFO - pandas: 2.3.3 2026-01-16 10:53:51,862 - julearn - INFO - julearn: 0.3.5.dev123 2026-01-16 10:53:51,862 - julearn - INFO - ======================== .. GENERATED FROM PYTHON SOURCE LINES 23-25 .. code-block:: Python df_iris = load_dataset("iris") .. GENERATED FROM PYTHON SOURCE LINES 26-28 The dataset has three kind of species. We will keep two to perform a binary classification. .. GENERATED FROM PYTHON SOURCE LINES 28-30 .. code-block:: Python df_iris = df_iris[df_iris["species"].isin(["versicolor", "virginica"])] .. GENERATED FROM PYTHON SOURCE LINES 31-33 As features, we will use the sepal length, width and petal length. We will try to predict the species. .. GENERATED FROM PYTHON SOURCE LINES 33-47 .. code-block:: Python X = ["sepal_length", "sepal_width", "petal_length"] y = "species" scores = run_cross_validation( X=X, y=y, data=df_iris, model="svm", problem_type="classification", preprocess="zscore", ) print(scores["test_score"]) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:53:51,937 - julearn - INFO - ==== Input Data ==== 2026-01-16 10:53:51,938 - julearn - INFO - Using dataframe as input 2026-01-16 10:53:51,938 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:51,938 - julearn - INFO - Target: species 2026-01-16 10:53:51,938 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:51,938 - julearn - INFO - X_types:{} 2026-01-16 10:53:51,938 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. warn_with_log( 2026-01-16 10:53:51,939 - julearn - INFO - ==================== 2026-01-16 10:53:51,940 - julearn - INFO - 2026-01-16 10:53:51,940 - julearn - INFO - Adding step zscore that applies to ColumnTypes 2026-01-16 10:53:51,940 - julearn - INFO - Step added 2026-01-16 10:53:51,940 - julearn - INFO - Adding step svm that applies to ColumnTypes 2026-01-16 10:53:51,940 - julearn - INFO - Step added 2026-01-16 10:53:51,941 - julearn - INFO - = Model Parameters = 2026-01-16 10:53:51,941 - julearn - INFO - ==================== 2026-01-16 10:53:51,942 - julearn - INFO - 2026-01-16 10:53:51,942 - julearn - INFO - = Data Information = 2026-01-16 10:53:51,942 - julearn - INFO - Problem type: classification 2026-01-16 10:53:51,942 - julearn - INFO - Number of samples: 100 2026-01-16 10:53:51,942 - julearn - INFO - Number of features: 3 2026-01-16 10:53:51,942 - julearn - INFO - ==================== 2026-01-16 10:53:51,942 - julearn - INFO - 2026-01-16 10:53:51,942 - julearn - INFO - Number of classes: 2 2026-01-16 10:53:51,942 - julearn - INFO - Target type: object 2026-01-16 10:53:51,943 - julearn - INFO - Class distributions: species versicolor 50 virginica 50 Name: count, dtype: int64 2026-01-16 10:53:51,943 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False) 2026-01-16 10:53:51,944 - julearn - INFO - Binary classification problem detected. 0 0.90 1 0.75 2 0.95 3 0.70 4 0.90 Name: test_score, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 48-52 Additionally, we can choose to assess the performance of the model using different scoring functions. For example, we might have an unbalanced dataset: .. GENERATED FROM PYTHON SOURCE LINES 52-56 .. code-block:: Python df_unbalanced = df_iris[20:] # drop the first 20 versicolor samples print(df_unbalanced["species"].value_counts()) .. rst-class:: sphx-glr-script-out .. code-block:: none species virginica 50 versicolor 30 Name: count, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 57-62 If we compute the `accuracy`, we might not account for this imbalance. A more suitable metric is the `balanced_accuracy`. More information in ``scikit-learn``: :func:`~sklearn.metrics.balanced_accuracy_score`. We will also set the random seed so we always split the data in the same way. .. GENERATED FROM PYTHON SOURCE LINES 62-76 .. code-block:: Python scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model="svm", seed=42, preprocess="zscore", problem_type="classification", scoring=["accuracy", "balanced_accuracy"], ) print(scores["test_accuracy"].mean()) print(scores["test_balanced_accuracy"].mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:53:51,996 - julearn - INFO - Setting random seed to 42 2026-01-16 10:53:51,996 - julearn - INFO - ==== Input Data ==== 2026-01-16 10:53:51,996 - julearn - INFO - Using dataframe as input 2026-01-16 10:53:51,996 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:51,996 - julearn - INFO - Target: species 2026-01-16 10:53:51,996 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:51,996 - julearn - INFO - X_types:{} 2026-01-16 10:53:51,996 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. warn_with_log( 2026-01-16 10:53:51,997 - julearn - INFO - ==================== 2026-01-16 10:53:51,997 - julearn - INFO - 2026-01-16 10:53:51,997 - julearn - INFO - Adding step zscore that applies to ColumnTypes 2026-01-16 10:53:51,998 - julearn - INFO - Step added 2026-01-16 10:53:51,998 - julearn - INFO - Adding step svm that applies to ColumnTypes 2026-01-16 10:53:51,998 - julearn - INFO - Step added 2026-01-16 10:53:51,998 - julearn - INFO - = Model Parameters = 2026-01-16 10:53:51,998 - julearn - INFO - ==================== 2026-01-16 10:53:51,999 - julearn - INFO - 2026-01-16 10:53:51,999 - julearn - INFO - = Data Information = 2026-01-16 10:53:51,999 - julearn - INFO - Problem type: classification 2026-01-16 10:53:51,999 - julearn - INFO - Number of samples: 80 2026-01-16 10:53:51,999 - julearn - INFO - Number of features: 3 2026-01-16 10:53:51,999 - julearn - INFO - ==================== 2026-01-16 10:53:51,999 - julearn - INFO - 2026-01-16 10:53:51,999 - julearn - INFO - Number of classes: 2 2026-01-16 10:53:51,999 - julearn - INFO - Target type: object 2026-01-16 10:53:52,000 - julearn - INFO - Class distributions: species virginica 50 versicolor 30 Name: count, dtype: int64 2026-01-16 10:53:52,000 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False) 2026-01-16 10:53:52,000 - julearn - INFO - Binary classification problem detected. /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true warnings.warn("y_pred contains classes not in y_true") /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true warnings.warn("y_pred contains classes not in y_true") /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true warnings.warn("y_pred contains classes not in y_true") /opt/hostedtoolcache/Python/3.14.2/x64/lib/python3.14/site-packages/sklearn/metrics/_classification.py:2801: UserWarning: y_pred contains classes not in y_true warnings.warn("y_pred contains classes not in y_true") 0.8625 0.8678571428571429 .. GENERATED FROM PYTHON SOURCE LINES 77-87 Other kind of metrics allows us to evaluate how good our model is to detect specific targets. Suppose we want to create a model that correctly identifies the `versicolor` samples. Now we might want to evaluate the precision score, or the ratio of true positives (tp) over all positives (true and false positives). More information in ``scikit-learn``: :func:`~sklearn.metrics.precision_score`. For this metric to work, we need to define which are our `positive` values. In this example, we are interested in detecting `versicolor`. .. GENERATED FROM PYTHON SOURCE LINES 87-99 .. code-block:: Python precision_scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model="svm", preprocess="zscore", problem_type="classification", seed=42, scoring="precision", pos_labels="versicolor", ) print(precision_scores["test_score"].mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-01-16 10:53:52,051 - julearn - INFO - Setting random seed to 42 2026-01-16 10:53:52,051 - julearn - INFO - ==== Input Data ==== 2026-01-16 10:53:52,052 - julearn - INFO - Using dataframe as input 2026-01-16 10:53:52,052 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:52,052 - julearn - INFO - Target: species 2026-01-16 10:53:52,052 - julearn - INFO - Expanded features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-01-16 10:53:52,052 - julearn - INFO - X_types:{} 2026-01-16 10:53:52,052 - julearn - WARNING - The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. /home/runner/work/julearn/julearn/julearn/prepare.py:576: RuntimeWarning: The following columns are not defined in X_types: ['sepal_length', 'sepal_width', 'petal_length']. They will be treated as continuous. warn_with_log( 2026-01-16 10:53:52,053 - julearn - INFO - Setting the following as positive labels ['versicolor'] 2026-01-16 10:53:52,053 - julearn - INFO - ==================== 2026-01-16 10:53:52,053 - julearn - INFO - 2026-01-16 10:53:52,053 - julearn - INFO - Adding step zscore that applies to ColumnTypes 2026-01-16 10:53:52,054 - julearn - INFO - Step added 2026-01-16 10:53:52,054 - julearn - INFO - Adding step svm that applies to ColumnTypes 2026-01-16 10:53:52,054 - julearn - INFO - Step added 2026-01-16 10:53:52,054 - julearn - INFO - = Model Parameters = 2026-01-16 10:53:52,054 - julearn - INFO - ==================== 2026-01-16 10:53:52,055 - julearn - INFO - 2026-01-16 10:53:52,055 - julearn - INFO - = Data Information = 2026-01-16 10:53:52,055 - julearn - INFO - Problem type: classification 2026-01-16 10:53:52,055 - julearn - INFO - Number of samples: 80 2026-01-16 10:53:52,055 - julearn - INFO - Number of features: 3 2026-01-16 10:53:52,055 - julearn - INFO - ==================== 2026-01-16 10:53:52,055 - julearn - INFO - 2026-01-16 10:53:52,055 - julearn - INFO - Number of classes: 2 2026-01-16 10:53:52,055 - julearn - INFO - Target type: int64 2026-01-16 10:53:52,056 - julearn - INFO - Class distributions: species 0 50 1 30 Name: count, dtype: int64 2026-01-16 10:53:52,056 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False) 2026-01-16 10:53:52,056 - julearn - INFO - Binary classification problem detected. 0.4 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.249 seconds) .. _sphx_glr_download_auto_examples_00_starting_run_simple_binary_classification.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run_simple_binary_classification.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run_simple_binary_classification.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: run_simple_binary_classification.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_