Stacking Classification#

This example uses the iris dataset and performs a complex stacking classification. We will use two different classifiers, one applied to petal features and one applied to sepal features. A final logistic regression classifier will be applied on the predictions of the two classifiers.

# Authors: Federico Raimondo <f.raimondo@fz-juelich.de>
# License: AGPL

from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.pipeline import PipelineCreator
from julearn.utils import configure_logging

Set the logging level to info to see extra information.

configure_logging(level="INFO")
/home/runner/work/julearn/julearn/julearn/utils/logging.py:66: UserWarning: The '__version__' attribute is deprecated and will be removed in MarkupSafe 3.1. Use feature detection, or `importlib.metadata.version("markupsafe")`, instead.
  vstring = str(getattr(module, "__version__", None))
2024-10-17 14:15:45,833 - julearn - INFO - ===== Lib Versions =====
2024-10-17 14:15:45,834 - julearn - INFO - numpy: 1.26.4
2024-10-17 14:15:45,834 - julearn - INFO - scipy: 1.14.1
2024-10-17 14:15:45,834 - julearn - INFO - sklearn: 1.5.2
2024-10-17 14:15:45,834 - julearn - INFO - pandas: 2.2.3
2024-10-17 14:15:45,834 - julearn - INFO - julearn: 0.3.4
2024-10-17 14:15:45,834 - julearn - INFO - ========================
df_iris = load_dataset("iris")

The dataset has three kind of species. We will keep two to perform a binary classification.

df_iris = df_iris[df_iris["species"].isin(["versicolor", "virginica"])]

As features, we will use the sepal length, width and petal length. We will try to predict the species.

X = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
y = "species"

# Define our feature types
X_types = {
    "sepal": ["sepal_length", "sepal_width"],
    "petal": ["petal_length", "petal_width"],
}

# Create the pipeline for the sepal features, by default will apply to "sepal"
model_sepal = PipelineCreator(problem_type="classification", apply_to="sepal")
model_sepal.add("filter_columns", apply_to="*", keep="sepal")
model_sepal.add("zscore")
model_sepal.add("svm")

# Create the pipeline for the petal features, by default will apply to "petal"
model_petal = PipelineCreator(problem_type="classification", apply_to="petal")
model_petal.add("filter_columns", apply_to="*", keep="petal")
model_petal.add("zscore")
model_petal.add("rf")

# Create the stacking model
model = PipelineCreator(problem_type="classification")
model.add(
    "stacking",
    estimators=[[("model_sepal", model_sepal), ("model_petal", model_petal)]],
    apply_to="*",
)

scores = run_cross_validation(
    X=X, y=y, X_types=X_types, data=df_iris, model=model
)

print(scores["test_score"])
2024-10-17 14:15:45,837 - julearn - INFO - Adding step filter_columns that applies to ColumnTypes<types={'*'}; pattern=.*>
2024-10-17 14:15:45,837 - julearn - INFO - Setting hyperparameter keep = sepal
2024-10-17 14:15:45,837 - julearn - INFO - Step added
2024-10-17 14:15:45,837 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2024-10-17 14:15:45,837 - julearn - INFO - Step added
2024-10-17 14:15:45,837 - julearn - INFO - Adding step svm that applies to ColumnTypes<types={'sepal'}; pattern=(?:__:type:__sepal)>
2024-10-17 14:15:45,838 - julearn - INFO - Step added
2024-10-17 14:15:45,838 - julearn - INFO - Adding step filter_columns that applies to ColumnTypes<types={'*'}; pattern=.*>
2024-10-17 14:15:45,838 - julearn - INFO - Setting hyperparameter keep = petal
2024-10-17 14:15:45,838 - julearn - INFO - Step added
2024-10-17 14:15:45,838 - julearn - INFO - Adding step zscore that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2024-10-17 14:15:45,838 - julearn - INFO - Step added
2024-10-17 14:15:45,838 - julearn - INFO - Adding step rf that applies to ColumnTypes<types={'petal'}; pattern=(?:__:type:__petal)>
2024-10-17 14:15:45,838 - julearn - INFO - Step added
2024-10-17 14:15:45,838 - julearn - INFO - Adding step stacking that applies to ColumnTypes<types={'*'}; pattern=.*>
2024-10-17 14:15:45,838 - julearn - INFO - Setting hyperparameter estimators = [('model_sepal', <julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f45e1b43820>), ('model_petal', <julearn.pipeline.pipeline_creator.PipelineCreator object at 0x7f45e1b43640>)]
2024-10-17 14:15:45,838 - julearn - INFO - Step added
2024-10-17 14:15:45,839 - julearn - INFO - ==== Input Data ====
2024-10-17 14:15:45,839 - julearn - INFO - Using dataframe as input
2024-10-17 14:15:45,839 - julearn - INFO -      Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2024-10-17 14:15:45,839 - julearn - INFO -      Target: species
2024-10-17 14:15:45,839 - julearn - INFO -      Expanded features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
2024-10-17 14:15:45,839 - julearn - INFO -      X_types:{'sepal': ['sepal_length', 'sepal_width'], 'petal': ['petal_length', 'petal_width']}
2024-10-17 14:15:45,840 - julearn - INFO - ====================
2024-10-17 14:15:45,840 - julearn - INFO -
2024-10-17 14:15:45,841 - julearn - INFO - = Model Parameters =
2024-10-17 14:15:45,841 - julearn - INFO - ====================
2024-10-17 14:15:45,841 - julearn - INFO -
2024-10-17 14:15:45,842 - julearn - INFO - = Model Parameters =
2024-10-17 14:15:45,842 - julearn - INFO - ====================
2024-10-17 14:15:45,842 - julearn - INFO -
2024-10-17 14:15:45,877 - julearn - INFO - = Model Parameters =
2024-10-17 14:15:45,877 - julearn - INFO - ====================
2024-10-17 14:15:45,877 - julearn - INFO -
2024-10-17 14:15:45,877 - julearn - INFO - = Data Information =
2024-10-17 14:15:45,877 - julearn - INFO -      Problem type: classification
2024-10-17 14:15:45,877 - julearn - INFO -      Number of samples: 100
2024-10-17 14:15:45,877 - julearn - INFO -      Number of features: 4
2024-10-17 14:15:45,877 - julearn - INFO - ====================
2024-10-17 14:15:45,877 - julearn - INFO -
2024-10-17 14:15:45,877 - julearn - INFO -      Number of classes: 2
2024-10-17 14:15:45,878 - julearn - INFO -      Target type: object
2024-10-17 14:15:45,878 - julearn - INFO -      Class distributions: species
versicolor    50
virginica     50
Name: count, dtype: int64
2024-10-17 14:15:45,878 - julearn - INFO - Using outer CV scheme KFold(n_splits=5, random_state=None, shuffle=False)
2024-10-17 14:15:45,879 - julearn - INFO - Binary classification problem detected.
0    1.00
1    0.85
2    0.95
3    0.95
4    0.95
Name: test_score, dtype: float64

Total running time of the script: (0 minutes 3.815 seconds)

Gallery generated by Sphinx-Gallery