Skip to content

Model diagnostics

This is where Ferrum's headline claim — model outputs are data — becomes specific code.

The diagnostic surface consists of three coordinated layers. You can mix and match them freely; they all return regular Chart objects (or compound views) that compose with the rest of the grammar.

Layer What it is When to reach for it
Figure-level helpers roc_chart, calibration_chart, confusion_matrix_chart, shap_chart, etc. One-line entry points. Takes a fitted model + test data, returns a Chart.
ModelSource The data interface When you want to compute derived diagnostic data once and reuse it across multiple charts.
sklearn-protocol visualizers ROCVisualizer, CalibrationVisualizer, ConfusionMatrixVisualizer, etc. When you want lifecycle control (.fit() / .score() / .show()) or are following a yellowbrick-style pattern.

The design rationale is on the Model outputs are data Concepts page; this page is the practical reference.

Figure-level helpers

The fast path: pass a fitted model and held-out data, get a Chart back. The helpers cover the standard model-evaluation surface and dispatch the underlying transforms in Rust.

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

roc = fm.roc_chart(model, X_test, y_test)
assert roc.show_svg().startswith("<svg")

ROC curve

The same pattern produces a confusion matrix:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

cm = fm.confusion_matrix_chart(model, X_test, y_test, normalize="true")
assert cm.show_svg().startswith("<svg")

Confusion matrix

Or feature importances, with the helper handling whichever importance method the estimator exposes (feature_importances_, permutation importance, coefficients):

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

importances = fm.importance_chart(model, X_test, y_test)
assert importances.show_svg().startswith("<svg")

Feature importance

The full helper menu

Every helper follows the same signature shape: helper(model_or_source, X=None, y=None, **kwargs) -> Chart. Pass a fitted model + held-out data, or pass a pre-constructed ModelSource as the first argument (next section). All helpers accept a theme= keyword.

Family Helpers
Classification roc_chart, pr_chart, calibration_chart, confusion_matrix_chart, class_prediction_error_chart, classification_report_chart, class_balance_chart, discrimination_threshold_chart, gain_chart, lift_chart
Regression residuals_chart, prediction_error_chart, cooks_distance_chart
Feature explanation importance_chart, shap_chart, shap_beeswarm_chart, shap_bar_chart, shap_waterfall_chart, pdp_chart
Model selection learning_curve_chart, validation_curve_chart, cv_scores_chart, alpha_selection_chart
Clustering / manifold silhouette_chart, elbow_chart, manifold_chart, pca_scree_chart, intercluster_distance_chart, parallel_coordinates_chart, decision_boundary_chart

The full API surface is on the API Reference / ferrum page.

ModelSource: derived diagnostic data

ModelSource wraps a fitted estimator and held-out data, then exposes derived diagnostic tables (predicted probabilities, ROC curve points, calibration bins, confusion counts, residuals, SHAP values, partial dependence grids, ...) as polars DataFrames.

When you call roc_chart(model, X, y), the helper builds a ModelSource internally, asks it for the ROC curve points, and feeds those points to a chart spec. If you're computing multiple diagnostics on the same model + dataset, it's more efficient — and cleaner — to build the ModelSource once and pass it to each helper:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

source = fm.ModelSource(model, X_test, y_test)
roc = fm.roc_chart(source)
cm = fm.confusion_matrix_chart(source)
importances = fm.importance_chart(source)
report = (roc | cm) & importances
assert report.show_svg().startswith("<svg")

Three-panel diagnostic report

The report value is a regular composed chart — (roc | cm) & importances lays the three diagnostics into a 2 × 2 grid (with the importance chart spanning the bottom row), and you can save it, theme it, or further compose it as one artifact. The composition operators are the same | and & you use for any other charts (see Composition).

Why ModelSource matters

The boundary ModelSource enforces is the load-bearing one: it computes the derived diagnostic tables once, then every chart consumes the result. Without ModelSource, computing a ROC curve and a calibration curve on the same model would re-predict probabilities twice, and you'd have to thread that data plumbing through your own code.

ModelSource also lazy-imports sklearn, shap, and umap as needed: import ferrum does not pull those packages into your process. They load only when you actually compute a diagnostic that requires them.

Precomputed scores (no model required)

Every classification and regression helper also accepts raw y_true= / y_pred= arrays instead of a fitted model. This is useful when you already have predictions — from a saved CSV, a batch inference job, a non-sklearn framework, or an evaluation pipeline that separates prediction from visualization:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

# Predict once, visualize many ways
y_proba = model.predict_proba(X_test)
y_pred = model.predict(X_test)

roc = fm.roc_chart(y_true=y_test, y_pred=y_proba)
cm = fm.confusion_matrix_chart(y_true=y_test, y_pred=y_pred)
report = roc | cm
assert report.show_svg().startswith("<svg")

The two paths are mutually exclusive — pass either model, X, y or y_true=, y_pred=, not both.

What y_pred means

The interpretation of y_pred depends on the chart:

Chart needs What to pass as y_pred Helpers
Soft scores / probabilities predict_proba(X) (1-D binary or 2-D multiclass) roc_chart, pr_chart, calibration_chart, gain_chart, lift_chart, discrimination_threshold_chart
Hard class labels predict(X) (1-D) confusion_matrix_chart, class_prediction_error_chart
Fitted values predict(X) (1-D continuous) residuals_chart, prediction_error_chart

Limitations of the precomputed path

  • No compare= — multi-model comparison requires fitted models so each can be re-predicted on the same data.
  • No cv= — cross-validation helpers (learning_curve_chart, validation_curve_chart, cv_scores_chart) need a model to re-fit across folds.
  • No feature-based helpersimportance_chart, shap_chart, pdp_chart, and clustering helpers require a model or ModelSource.

sklearn-protocol visualizers

For lifecycle control or yellowbrick-style ergonomics, every diagnostic also has a visualizer class. The visualizer takes the model at construction time, runs through .fit() / .score(), and exposes .show() which returns a Chart:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

visualizer = fm.ROCVisualizer(model)
visualizer.fit(X_train, y_train).score(X_test, y_test)
chart = visualizer.show()
assert chart.show_svg().startswith("<svg")

ROC via visualizer

The full visualizer menu mirrors the helpers:

Family Visualizers
Classification ROCVisualizer, PRVisualizer, CalibrationVisualizer, ConfusionMatrixVisualizer, ClassificationReportVisualizer, ClassPredictionErrorVisualizer, ClassBalanceVisualizer, DiscriminationThresholdVisualizer
Regression ResidualsVisualizer, PredictionErrorVisualizer, CooksDistanceVisualizer
Explanation FeatureImportancesVisualizer, SHAPVisualizer
Model selection LearningCurveVisualizer, ValidationCurveVisualizer, CVScoresVisualizer, AlphaSelectionVisualizer
Clustering / manifold SilhouetteVisualizer, ElbowVisualizer, ManifoldVisualizer, InterclusterDistanceVisualizer, PCAVarianceVisualizer

Pick the helper when you want the diagnostic with minimal ceremony. Pick the visualizer when you want CV-fold lifecycle, custom training/scoring splits, or compatibility with code patterns from yellowbrick.

Customizing diagnostic output

Every diagnostic helper accepts four override keywords for customization without dropping to the grammar API:

Keyword What it does
mark= Override or suppress sub-layers by name.
encode= Override encoding channels.
properties= Override chart properties (title, width, height).
layers= Append extra layers on top.

Diagnostic charts are composites with named sub-layers. Inspect them with .layer_names:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

chart = fm.roc_chart(model, X_test, y_test)
names = chart.layer_names
assert "line" in names
assert "reference" in names

Override sub-layers by name — suppress them with False, or merge mark kwargs with a dict:

import ferrum as fm
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)
model = RandomForestClassifier(n_estimators=20, random_state=0).fit(X_train, y_train)

chart = fm.roc_chart(
    model, X_test, y_test,
    mark={"line": {"stroke_width": 3}},
    properties={"title": "ROC — Random Forest"},
)
assert chart.show_svg().startswith("<svg")

ROC with overrides

The same mark=, encode=, properties=, layers= pattern works on every diagnostic helper — confusion_matrix_chart, calibration_chart, importance_chart, and all the rest.

Diagnostics compose with everything else

The most important property of these helpers is structural: their output is a regular Ferrum chart. That means a ROC curve participates in the rest of the grammar identically to a scatter plot:

  • Theme it with .theme(fm.themes.publication) or set a process default with set_default_theme.
  • Save it with .save("roc.svg").
  • Concatenate it with | or & (as shown above).
  • Pass it through anywhere a Chart is expected.

A four-panel model report is (roc | cm) & (residuals | importances) — same composition operators as any other compound view. There is no separate API for "make these diagnostic charts work together."

Caveats and limitations

A few sharp edges worth knowing:

  • SHAP and shap-style helpers: require shap installed. They lazy-import on first call; install the optional ferrum[shap] extra to pull it in. UMAP runs in pure Rust via manifolds-rs — no Python dependency.
  • Per-class breakdowns: classifier diagnostics default to a per-class view when the model has more than two classes. Pass per_class=False to collapse to a macro / micro / weighted average.
  • Compare multiple models: most classification helpers accept a compare= keyword (or a ComparedModelSource data source) for side-by-side comparison. See the API reference for the per-helper signatures.

Where to go next

  • Model outputs are data for the design rationale behind treating diagnostics as charts.
  • Figure-level helpers for the broader family of one-line chart helpers (most diagnostic helpers follow the same pattern).
  • Composition for the operators (+, |, &) used to compose multiple diagnostics into a single model report.
  • Themes for applying consistent styling to a multi-chart diagnostic view.
  • The API Reference / ferrum for the full signatures of every *_chart helper and every *Visualizer class.