Skip to content

Model Sources

ModelSource / ComparedModelSource — fitted-model adapters that feed the diagnostics.

Ferrum — a statistical visualization library with a Rust core.

ComparedModelSource

Multi-model wrapper exposing the same surface as ModelSource.

Every derived-data method is proxied through each underlying ModelSource and the per-model outputs are concatenated with a model: Utf8 column stamped on each frame, so downstream chart builders can route color="model" to render one curve per model.

_X, _y, _feature_names, and _class_names resolve to the first source's values (every wrapped source shares X / y by construction in ModelSource.compare, so any one will do); accessing _model raises since there is no single estimator. model_names reports the configured ordering.

Parameters:

Name Type Description Default
sources dict[str, ModelSource]

Mapping from model name (used for the model column) to the underlying ModelSource. Must contain at least one entry — passing an empty dict raises ValueError.

required

Examples:

>>> import ferrum as fm
>>> cms = fm.ModelSource.compare({"ridge": ridge, "lasso": lasso}, X, y)
>>> fm.roc_chart(cms)                  # overlay both curves
>>> cms.model_names
['ridge', 'lasso']
>>> cms.roc_curve()                    # long-form frame with `model` column

model_names property

model_names: list[str]

Ordered list of model display names.

Returns the keys of the sources dict supplied at construction time, in insertion order. Each name corresponds to the value written into the model column on every derived-data DataFrame.

Returns:

Type Description
list[str]

Model names in the order they were registered.

ModelSource

Bases: PredictionsMixin, ClassificationCurvesMixin, FeatureImportanceMixin, ModelSelectionMixin, ClusteringMixin, RankingMixin, BaseSource

Wrap a fitted estimator + dataset and expose model-diagnostic derived data as polars DataFrames.

Constructing a ModelSource is sklearn-free — only attribute introspection runs at __init__ time. Derived-data methods that need sklearn / shap / umap lazy-import on call, so import ferrum never pulls those packages into the user's process unless they actually compute a diagnostic that requires them.

Each derived-data method returns a long-form polars DataFrame whose schema is documented in ferrum._diagnostics.schemas — chart builders and Visualizers consume the same frames.

Parameters:

Name Type Description Default
model Any

A fitted estimator. Must expose at least predict; some methods require additional protocol attributes (predict_proba, coef_, feature_importances_, cluster_centers_, explained_variance_ratio_, …) and raise AttributeError with the missing attribute name when called against an incompatible model.

required
X DataFrame | DataFrame | Table | ndarray

Feature matrix. Coerced internally to a polars DataFrame; any narwhals-compatible input also works.

required
y array - like

Target. Required by methods that depend on ground truth (every method except probabilities and the unsupervised silhouette / pca_variance / embeddings / intercluster_distance / rank1d(algorithm != "covariance") / rank2d family).

None
feature_names sequence of str

Column labels. Defaults to X.columns when X is a DataFrame, or ["f0", "f1", ...] otherwise.

None
class_names sequence of str

Per-class display labels for classification diagnostics. Defaults to model.classes_ when available, else the unique values of y.

None
sample_weight array - like

Per-row weights forwarded to sklearn scorers that accept them.

None
random_state int

Seed propagated to every derived-data method whose underlying compute consumes randomness (importances permutation, SHAP background sampling, UMAP / t-SNE / MDS embeddings, cross-validation curves, partial-dependence sampling). Deterministic methods ignore the value.

None

Examples:

>>> import ferrum as fm
>>> source = fm.ModelSource(model, X, y, random_state=0)
>>> fm.roc_chart(source)              # use directly with a figure function
>>> source.predictions()              # access derived data as a DataFrame
>>> source.confusion_matrix(normalize="true")

X property

X: DataFrame

Feature matrix coerced to a polars DataFrame.

Returns the value supplied to __init__ (after coercion). Use this for read-only access from chart builders and external callers — source._X is an internal alias preserved for back-compat.

y property

y: 'pl.Series | None'

Target series, or None when no y was supplied.

Returns the polars Series the constructor coerced from the y argument. None means unsupervised — methods that need ground truth raise on call.

model property

model: Any

The wrapped fitted estimator.

Returns the model object supplied at construction time unchanged. Chart builders use it for occasional native introspection (e.g. model.classes_, model.n_clusters); prefer the public derived-data methods when one exists.

feature_names property

feature_names: list[str]

Column labels for the feature matrix.

Returns the names supplied at construction time, or the DataFrame column names when X was a DataFrame, or ["f0", "f1", ...] for unlabeled array inputs.

Returns:

Type Description
list[str]

Feature names in the same order as the columns of X.

capabilities property

capabilities: frozenset[str]

Protocol attributes present on the wrapped estimator.

A frozen subset of _PROTOCOL_ATTRS ("predict", "predict_proba", "coef_", "feature_importances_", …) detected at construction time via hasattr. Derived-data methods gate on this set to pick the appropriate code path and raise AttributeError with a clear message when a required attribute is absent.

Returns:

Type Description
frozenset[str]

Attribute names that are present on the wrapped model.

rank1d

rank1d(*, algorithm: str = 'shapiro') -> pl.DataFrame

Univariate feature ranking.

algorithm in {"shapiro", "variance", "covariance"}. The Shapiro-Wilk and variance algorithms operate on X alone; "covariance" ranks features by absolute sample covariance with y and therefore requires y to be present.

Output schema (SCHEMA_RANK1D): feature: Utf8, score: Float64, rank: Int64. Rows are pre-sorted by descending score so rank=1 is always the top feature.

rank2d

rank2d(*, algorithm: str = 'pearson') -> pl.DataFrame

Pairwise feature ranking — long-form correlation matrix.

algorithm in {"pearson", "spearman", "kendall", "covariance"}. All algorithms now run in Rust (Kendall uses Knight's O(n log n)).

Output schema (SCHEMA_RANK2D): feature_x: Utf8, feature_y: Utf8, correlation: Float64 — one row per ordered pair of features, p × p rows total.

silhouette

silhouette(*, k: int | None = None) -> pl.DataFrame

Per-sample silhouette values, sorted within cluster descending.

Returns one row per sample with columns sample_id (original X index), y_position (sequential 0..n-1 stack order — used by mark_silhouette to render bars in a tightly-packed Rousseeuw layout), cluster, and silhouette_value.

k is informational; if provided, the result is filtered to clusters in range(k).

pca_variance

pca_variance(*, n_components: int | None = None) -> pl.DataFrame

Explained-variance ratio per principal component plus the cumulative running sum.

If the wrapped model exposes explained_variance_ratio_ (e.g. sklearn.decomposition.PCA), reads it directly (backward compat). Otherwise computes from raw X via Rust SVD.

embeddings

embeddings(*, method: str = 'umap', n_components: int = 2, **method_kwargs: Any) -> pl.DataFrame

Low-dimensional embedding of X via UMAP / t-SNE / PCA.

Returns dim_0dim_{n_components-1} plus a label column (y when provided, else zeros — used to color the scatter). random_state is taken from the source's random_state.

intercluster_distance

intercluster_distance(k: int, *, method: str = 'mds') -> pl.DataFrame

2D embedding of cluster centers + cluster size.

Returns one row per cluster with cluster (Int64), x / y (Float64, the 2D embedded coordinate), and size (Int64, sample count). Requires the wrapped model to expose cluster_centers_.

learning_curve

learning_curve(*, cv: int = 5, scoring: Any = None, train_sizes: Any = None) -> pl.DataFrame

Learning curve: score per (train_size, fold, split).

Returns long-form rows — one per (train_size, fold, split). Each row carries the per-fold score plus the per-(train_size, split) aggregates mean_score, std_score, lower, upper (95% CI on the mean). Chart builders dedupe by (train_size, split) to render a ribbon + line; the per-fold rows enable per-fold strip overlays if a future caller wants them.

validation_curve

validation_curve(param: str, values: Any, *, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Validation curve: score per (param_value, fold, split).

Same shape as learning_curve but parameterized by an estimator hyperparameter sweep. param is the kwarg name on the wrapped estimator (e.g. "alpha" for Ridge).

cv_scores

cv_scores(*, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Per-fold cross-validation scores.

Returns one row per (fold, split) — train and test scores for each cross-validation fold. Chart builders use this for boxplot / bar / strip distributions across folds.

alpha_selection

alpha_selection(alphas: Any, *, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Regularization-strength sweep for linear models.

Returns one row per (alpha, fold) — the per-fold test score on the held-out split — plus per-alpha mean_score / std_score aggregates. Chart builders dedupe by alpha to render a single line, and use argmax(mean_score) to mark the best alpha.

importances

importances(*, method: str = 'builtin', n_repeats: int = 30, scoring: Any = None, random_state: int | None = None) -> pl.DataFrame

Feature importance per feature, sorted by descending |importance|.

method="builtin" reads the wrapped model's feature_importances_ (tree-based estimators) or coef_ (linear estimators, averaged absolute value across classes for multi-output linears). std is zero in this path — sklearn's built-in attribute exposes no per-feature variance.

method="permutation" calls sklearn's permutation_importance with n_repeats/scoring and populates std with the per-feature standard deviation across repeats.

shap_values

shap_values(*, background: Any = None, max_evals: int = 500) -> pl.DataFrame

Long-form SHAP values per (sample, feature, class).

Returns a DataFrame with sample_id, feature, shap_value, feature_value, feature_value_normalized, class_label.

  • Regression: class_label is the constant "target" on every row.
  • Binary classifiers: class_label is the positive-class name on every row; SHAP values are for the positive class.
  • Multi-class classifiers: one row per (sample, feature, class); class_label carries the class name. The result has n_samples * n_features * n_classes rows total.

Explainer is auto-picked by model capability:

  • coef_: shap.LinearExplainer (deterministic, fast).
  • feature_importances_: shap.TreeExplainer (deterministic for tree ensembles).
  • otherwise: shap.KernelExplainer (model-agnostic; uses the first min(50, N) rows of X as the background unless an explicit background array is passed).

partial_dependence

partial_dependence(features: list[str | int], *, grid_resolution: int = 100, kind: str = 'average') -> pl.DataFrame

Partial dependence per feature.

kind="average" (default) returns the marginal PD curve per feature with sample_id = -1 (one row per grid point per feature).

kind="individual" returns per-sample ICE curves: one row per (feature, sample_id, grid_point) triple with sample_id in [0, n_samples). Chart builders pair this with the detail encoding channel on sample_id to render one polyline per sample.

kind="both" returns the union of the two: ICE rows plus average rows (sample_id = -1), so a downstream chart can overlay both layers on the same DataFrame.

roc_curve

roc_curve(*, average: str | None = None, drop_intermediate: bool = True) -> pl.DataFrame

ROC curve(s). One row per (class, threshold). auc repeats per class.

For binary classifiers with average=None (default), returns a single curve on the positive (second) class. For multiclass, returns one-vs-rest curves per class; pass average in {"micro", "macro", "weighted"} to additionally include a summary curve under class="<average>".

pr_curve

pr_curve(*, average: str | None = None) -> pl.DataFrame

Precision-recall curve(s). One row per (class, threshold).

For binary classifiers, returns a single curve on the positive (second) class — average is accepted for API symmetry with the multiclass path but has no effect because binary classifiers have only one curve to draw. For multiclass:

  • average=None (default) — returns one-vs-rest curves per class.
  • average in {"micro", "macro", "weighted"} — returns a single summary curve with class="<average>" and no per-class rows. Macro / weighted variants interpolate per- class precision over a shared recall grid (100 points); micro ravels the binarized labels into one curve. threshold is NaN on every row of macro / weighted summaries (recall-grid interpolation discards thresholds) and follows sklearn's padding convention for micro.

threshold is NaN at the final (recall=0) point of every per-class curve per sklearn's convention.

calibration_curve

calibration_curve(*, n_bins: int = 10, strategy: str = 'uniform') -> pl.DataFrame

Calibration (reliability) curve for binary classifiers.

Returns one row per non-empty bin with mean_predicted, fraction_positive, and count. Delegates to the calibration_kernel Rust kernel.

cumulative_gain

cumulative_gain() -> pl.DataFrame

Cumulative-gain curve per class. Appends a 2-row class='baseline' diagonal for plotting reference.

lift_curve

lift_curve() -> pl.DataFrame

Lift curve per class. Appends a 2-row class='baseline' line at lift=1.0.

discrimination_threshold

discrimination_threshold(*, n_thresholds: int = 50, cv: Any = None) -> pl.DataFrame

Discrimination threshold sweep — binary classifiers only.

Sweeps n_thresholds evenly-spaced thresholds in [0, 1] and reports precision, recall, F1, and queue_rate at each. queue_rate is the hand-computed fraction (y_score >= t).mean().

When cv is an int, runs the same sweep on each fold's held-out scores from a freshly-cloned + re-fit estimator and averages per-threshold metrics across folds. Pass a splitter object with a .split() method to override.

confusion_matrix

confusion_matrix(*, normalize: str | None = None) -> pl.DataFrame

Confusion matrix in long form: one row per (actual, predicted) cell.

normalize: None for raw counts, "true"/"pred"/"all" for sklearn-style normalization. value is the (possibly normalized) count; value_fmt is a stringified label suitable for mark_text overlay (integer counts when unnormalized, two-decimal fractions when normalized).

predictions

predictions() -> pl.DataFrame

Return y_true, y_pred, residual, studentized_residual, cooks_distance, leverage.

leverage is the diagonal of the hat matrix H = X (XᵀX)⁻¹ Xᵀ for linear estimators (those exposing coef_); NaN otherwise. Used by the residuals-vs-leverage panel of multi-panel residuals charts.

probabilities

probabilities() -> pl.DataFrame

Return y_true + one column per class with predicted probability.

compare classmethod

compare(models: dict[str, Any], X: Any, y: Any = None, **kwargs: Any) -> 'ComparedModelSource'

Build a ComparedModelSource over one ModelSource per model.

Each value in models is wrapped in its own ModelSource with the shared X and y. The returned ComparedModelSource proxies every derived-data method through all wrapped sources and stamps the model name as a model column on the concatenated output, so downstream chart builders can route color="model".

Parameters:

Name Type Description Default
models dict[str, Any]

Mapping from display name to fitted estimator. Each estimator is wrapped in its own ModelSource constructed with the shared X, y, and any additional kwargs (e.g. random_state, feature_names, class_names).

required
X array - like

Feature matrix shared by all models. Accepted types match ModelSource.__init__.

required
y array - like

Target shared by all models. Required by most derived-data methods (same constraints as ModelSource).

None
**kwargs Any

Keyword arguments forwarded verbatim to each ModelSource constructor (e.g. random_state, feature_names, class_names, sample_weight).

{}

Returns:

Type Description
ComparedModelSource

Multi-model wrapper whose derived-data methods return long-form DataFrames with an extra model: Utf8 column.

Examples:

>>> import ferrum as fm
>>> from sklearn.linear_model import Ridge, Lasso
>>> cms = fm.ModelSource.compare(
...     {"ridge": Ridge().fit(X, y), "lasso": Lasso().fit(X, y)},
...     X, y, random_state=0,
... )
>>> fm.roc_chart(cms)          # overlay both ROC curves
>>> cms.model_names
['ridge', 'lasso']