.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/basic/run_simple_binary_classification.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_basic_run_simple_binary_classification.py: Simple Binary Classification ============================ This example uses the 'iris' dataset and performs a simple binary classification using a Support Vector Machine classifier. .. include:: ../../links.inc .. GENERATED FROM PYTHON SOURCE LINES 10-17 .. code-block:: Python # Authors: Federico Raimondo # # License: AGPL from seaborn import load_dataset from julearn import run_cross_validation from julearn.utils import configure_logging .. GENERATED FROM PYTHON SOURCE LINES 18-19 Set the logging level to info to see extra information .. GENERATED FROM PYTHON SOURCE LINES 19-21 .. code-block:: Python configure_logging(level='INFO') .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-05-29 20:44:33,707 - julearn - INFO - ===== Lib Versions ===== 2026-05-29 20:44:33,707 - julearn - INFO - numpy: 1.19.3 2026-05-29 20:44:33,707 - julearn - INFO - scipy: 1.9.3 2026-05-29 20:44:33,707 - julearn - INFO - sklearn: 0.23.2 2026-05-29 20:44:33,707 - julearn - INFO - pandas: 1.4.4 2026-05-29 20:44:33,707 - julearn - INFO - julearn: 0.2.2.dev 2026-05-29 20:44:33,707 - julearn - INFO - ======================== .. GENERATED FROM PYTHON SOURCE LINES 22-24 .. code-block:: Python df_iris = load_dataset('iris') .. GENERATED FROM PYTHON SOURCE LINES 25-27 The dataset has three kind of species. We will keep two to perform a binary classification. .. GENERATED FROM PYTHON SOURCE LINES 27-29 .. code-block:: Python df_iris = df_iris[df_iris['species'].isin(['versicolor', 'virginica'])] .. GENERATED FROM PYTHON SOURCE LINES 30-32 As features, we will use the sepal length, width and petal length. We will try to predict the species. .. GENERATED FROM PYTHON SOURCE LINES 32-40 .. code-block:: Python X = ['sepal_length', 'sepal_width', 'petal_length'] y = 'species' scores = run_cross_validation( X=X, y=y, data=df_iris, model='svm', preprocess_X='zscore') print(scores['test_score']) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-05-29 20:44:33,711 - julearn - INFO - Using default CV 2026-05-29 20:44:33,711 - julearn - INFO - ==== Input Data ==== 2026-05-29 20:44:33,711 - julearn - INFO - Using dataframe as input 2026-05-29 20:44:33,711 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:33,711 - julearn - INFO - Target: species 2026-05-29 20:44:33,712 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:33,712 - julearn - INFO - Expanded Confounds: [] 2026-05-29 20:44:33,712 - julearn - INFO - ==================== 2026-05-29 20:44:33,713 - julearn - INFO - 2026-05-29 20:44:33,713 - julearn - INFO - ====== Model ====== 2026-05-29 20:44:33,713 - julearn - INFO - Obtaining model by name: svm 2026-05-29 20:44:33,713 - julearn - INFO - =================== 2026-05-29 20:44:33,713 - julearn - INFO - 2026-05-29 20:44:33,713 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0 0.90 1 1.00 2 0.85 3 0.90 4 0.95 5 0.95 6 0.85 7 0.90 8 1.00 9 0.95 10 0.95 11 0.90 12 1.00 13 0.85 14 0.90 15 0.95 16 0.95 17 1.00 18 0.85 19 0.85 20 0.85 21 0.95 22 0.90 23 0.95 24 0.95 Name: test_score, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 41-45 Additionally, we can choose to assess the performance of the model using different scoring functions. For example, we might have an unbalanced dataset: .. GENERATED FROM PYTHON SOURCE LINES 45-49 .. code-block:: Python df_unbalanced = df_iris[20:] # drop the first 20 versicolor samples print(df_unbalanced['species'].value_counts()) .. rst-class:: sphx-glr-script-out .. code-block:: none virginica 50 versicolor 30 Name: species, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 50-55 If we compute the `accuracy`, we might not account for this imbalance. A more suitable metric is the `balanced_accuracy`. More information in scikit-learn: `Balanced Accuracy`_ We will also set the random seed so we always split the data in the same way. .. GENERATED FROM PYTHON SOURCE LINES 55-63 .. code-block:: Python scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model='svm', seed=42, preprocess_X='zscore', scoring=['accuracy', 'balanced_accuracy']) print(scores['test_accuracy'].mean()) print(scores['test_balanced_accuracy'].mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-05-29 20:44:34,030 - julearn - INFO - Setting random seed to 42 2026-05-29 20:44:34,031 - julearn - INFO - Using default CV 2026-05-29 20:44:34,031 - julearn - INFO - ==== Input Data ==== 2026-05-29 20:44:34,031 - julearn - INFO - Using dataframe as input 2026-05-29 20:44:34,031 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:34,031 - julearn - INFO - Target: species 2026-05-29 20:44:34,031 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:34,031 - julearn - INFO - Expanded Confounds: [] 2026-05-29 20:44:34,032 - julearn - INFO - ==================== 2026-05-29 20:44:34,032 - julearn - INFO - 2026-05-29 20:44:34,032 - julearn - INFO - ====== Model ====== 2026-05-29 20:44:34,032 - julearn - INFO - Obtaining model by name: svm 2026-05-29 20:44:34,032 - julearn - INFO - =================== 2026-05-29 20:44:34,032 - julearn - INFO - 2026-05-29 20:44:34,032 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0.895 0.8708886668886668 .. GENERATED FROM PYTHON SOURCE LINES 64-74 Other kind of metrics allows us to evaluate how good our model is to detect specific targets. Suppose we want to create a model that correctly identifies the `versicolor` samples. Now we might want to evaluate the precision score, or the ratio of true positives (tp) over all positives (true and false positives). More information in scikit-learn: `Precision`_ For this metric to work, we need to define which are our `positive` values. In this example, we are interested in detecting `versicolor`. .. GENERATED FROM PYTHON SOURCE LINES 74-78 .. code-block:: Python precision_scores = run_cross_validation( X=X, y=y, data=df_unbalanced, model='svm', preprocess_X='zscore', seed=42, scoring='precision', pos_labels='versicolor') print(precision_scores['test_precision'].mean()) .. rst-class:: sphx-glr-script-out .. code-block:: none 2026-05-29 20:44:34,460 - julearn - INFO - Setting random seed to 42 2026-05-29 20:44:34,461 - julearn - INFO - Using default CV 2026-05-29 20:44:34,461 - julearn - INFO - ==== Input Data ==== 2026-05-29 20:44:34,461 - julearn - INFO - Using dataframe as input 2026-05-29 20:44:34,461 - julearn - INFO - Features: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:34,461 - julearn - INFO - Target: species 2026-05-29 20:44:34,461 - julearn - INFO - Expanded X: ['sepal_length', 'sepal_width', 'petal_length'] 2026-05-29 20:44:34,461 - julearn - INFO - Expanded Confounds: [] 2026-05-29 20:44:34,462 - julearn - INFO - Setting the following as positive labels ['versicolor'] 2026-05-29 20:44:34,462 - julearn - INFO - ==================== 2026-05-29 20:44:34,462 - julearn - INFO - 2026-05-29 20:44:34,462 - julearn - INFO - ====== Model ====== 2026-05-29 20:44:34,462 - julearn - INFO - Obtaining model by name: svm 2026-05-29 20:44:34,462 - julearn - INFO - =================== 2026-05-29 20:44:34,462 - julearn - INFO - 2026-05-29 20:44:34,462 - julearn - INFO - CV interpreted as RepeatedKFold with 5 repetitions of 5 folds 0.9223333333333333 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 8.945 seconds) .. _sphx_glr_download_auto_examples_basic_run_simple_binary_classification.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: run_simple_binary_classification.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: run_simple_binary_classification.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: run_simple_binary_classification.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_