Model Sources¶

ModelSource / ComparedModelSource — fitted-model adapters that feed the diagnostics.

Ferrum — a statistical visualization library with a Rust core.

ComparedModelSource ¶

Multi-model wrapper exposing the same surface as ModelSource.

Every derived-data method is proxied through each underlying ModelSource and the per-model outputs are concatenated with a model: Utf8 column stamped on each frame, so downstream chart builders can route color="model" to render one curve per model.

_X, _y, _feature_names, and _class_names resolve to the first source's values (every wrapped source shares X / y by construction in ModelSource.compare, so any one will do); accessing _model raises since there is no single estimator. model_names reports the configured ordering.

Parameters:

Name	Type	Description	Default
`sources`	`dict[str, ModelSource]`	Mapping from model name (used for the `model` column) to the underlying `ModelSource`. Must contain at least one entry — passing an empty dict raises `ValueError`.	required

Examples:

>>> import ferrum as fm
>>> cms = fm.ModelSource.compare({"ridge": ridge, "lasso": lasso}, X, y)
>>> fm.roc_chart(cms)                  # overlay both curves
>>> cms.model_names
['ridge', 'lasso']
>>> cms.roc_curve()                    # long-form frame with `model` column

model_names `property` ¶

model_names: list[str]

Ordered list of model display names.

Returns the keys of the sources dict supplied at construction time, in insertion order. Each name corresponds to the value written into the model column on every derived-data DataFrame.

Returns:

Type	Description
`list[str]`	Model names in the order they were registered.

ModelSource ¶

Bases: PredictionsMixin, ClassificationCurvesMixin, FeatureImportanceMixin, ModelSelectionMixin, ClusteringMixin, RankingMixin, BaseSource

Wrap a fitted estimator + dataset and expose model-diagnostic derived data as polars DataFrames.

Constructing a ModelSource is sklearn-free — only attribute introspection runs at __init__ time. Derived-data methods that need sklearn / shap / umap lazy-import on call, so import ferrum never pulls those packages into the user's process unless they actually compute a diagnostic that requires them.

Each derived-data method returns a long-form polars DataFrame whose schema is documented in ferrum._diagnostics.schemas — chart builders and Visualizers consume the same frames.

Parameters:

Name	Type	Description	Default
`model`	`Any`	A fitted estimator. Must expose at least `predict`; some methods require additional protocol attributes (`predict_proba`, `coef_`, `feature_importances_`, `cluster_centers_`, `explained_variance_ratio_`, …) and raise `AttributeError` with the missing attribute name when called against an incompatible model.	required
`X`	`DataFrame \| DataFrame \| Table \| ndarray`	Feature matrix. Coerced internally to a polars DataFrame; any `narwhals`-compatible input also works.	required
`y`	`array - like`	Target. Required by methods that depend on ground truth (every method except `probabilities` and the unsupervised `silhouette` / `pca_variance` / `embeddings` / `intercluster_distance` / `rank1d(algorithm != "covariance")` / `rank2d` family).	`None`
`feature_names`	`sequence of str`	Column labels. Defaults to `X.columns` when `X` is a DataFrame, or `["f0", "f1", ...]` otherwise.	`None`
`class_names`	`sequence of str`	Per-class display labels for classification diagnostics. Defaults to `model.classes_` when available, else the unique values of `y`.	`None`
`sample_weight`	`array - like`	Per-row weights forwarded to sklearn scorers that accept them.	`None`
`random_state`	`int`	Seed propagated to every derived-data method whose underlying compute consumes randomness (importances permutation, SHAP background sampling, UMAP / t-SNE / MDS embeddings, cross-validation curves, partial-dependence sampling). Deterministic methods ignore the value.	`None`

Examples:

>>> import ferrum as fm
>>> source = fm.ModelSource(model, X, y, random_state=0)
>>> fm.roc_chart(source)              # use directly with a figure function
>>> source.predictions()              # access derived data as a DataFrame
>>> source.confusion_matrix(normalize="true")

X `property` ¶

X: DataFrame

Feature matrix coerced to a polars DataFrame.

Returns the value supplied to __init__ (after coercion). Use this for read-only access from chart builders and external callers — source._X is an internal alias preserved for back-compat.

y `property` ¶

y: 'pl.Series | None'

Target series, or None when no y was supplied.

Returns the polars Series the constructor coerced from the y argument. None means unsupervised — methods that need ground truth raise on call.

model `property` ¶

model: Any

The wrapped fitted estimator.

Returns the model object supplied at construction time unchanged. Chart builders use it for occasional native introspection (e.g. model.classes_, model.n_clusters); prefer the public derived-data methods when one exists.

feature_names `property` ¶

feature_names: list[str]

Column labels for the feature matrix.

Returns the names supplied at construction time, or the DataFrame column names when X was a DataFrame, or ["f0", "f1", ...] for unlabeled array inputs.

Returns:

Type	Description
`list[str]`	Feature names in the same order as the columns of `X`.

capabilities `property` ¶

capabilities: frozenset[str]

Protocol attributes present on the wrapped estimator.

A frozen subset of _PROTOCOL_ATTRS ("predict", "predict_proba", "coef_", "feature_importances_", …) detected at construction time via hasattr. Derived-data methods gate on this set to pick the appropriate code path and raise AttributeError with a clear message when a required attribute is absent.

Returns:

Type	Description
`frozenset[str]`	Attribute names that are present on the wrapped model.

rank1d ¶

rank1d(*, algorithm: str = 'shapiro') -> pl.DataFrame

Univariate feature ranking.

algorithm in {"shapiro", "variance", "covariance"}. The Shapiro-Wilk and variance algorithms operate on X alone; "covariance" ranks features by absolute sample covariance with y and therefore requires y to be present.

Output schema (SCHEMA_RANK1D): feature: Utf8, score: Float64, rank: Int64. Rows are pre-sorted by descending score so rank=1 is always the top feature.

rank2d ¶

rank2d(*, algorithm: str = 'pearson') -> pl.DataFrame

Pairwise feature ranking — long-form correlation matrix.

algorithm in {"pearson", "spearman", "kendall", "covariance"}. All algorithms now run in Rust (Kendall uses Knight's O(n log n)).

Output schema (SCHEMA_RANK2D): feature_x: Utf8, feature_y: Utf8, correlation: Float64 — one row per ordered pair of features, p × p rows total.

silhouette ¶

silhouette(*, k: int | None = None) -> pl.DataFrame

Per-sample silhouette values, sorted within cluster descending.

Returns one row per sample with columns sample_id (original X index), y_position (sequential 0..n-1 stack order — used by mark_silhouette to render bars in a tightly-packed Rousseeuw layout), cluster, and silhouette_value.

k is informational; if provided, the result is filtered to clusters in range(k).

pca_variance ¶

pca_variance(*, n_components: int | None = None) -> pl.DataFrame

Explained-variance ratio per principal component plus the cumulative running sum.

If the wrapped model exposes explained_variance_ratio_ (e.g. sklearn.decomposition.PCA), reads it directly (backward compat). Otherwise computes from raw X via Rust SVD.

embeddings ¶

embeddings(*, method: str = 'umap', n_components: int = 2, **method_kwargs: Any) -> pl.DataFrame

Low-dimensional embedding of X via UMAP / t-SNE / PCA.

Returns dim_0 … dim_{n_components-1} plus a label column (y when provided, else zeros — used to color the scatter). random_state is taken from the source's random_state.

intercluster_distance ¶

intercluster_distance(k: int, *, method: str = 'mds') -> pl.DataFrame

2D embedding of cluster centers + cluster size.

Returns one row per cluster with cluster (Int64), x / y (Float64, the 2D embedded coordinate), and size (Int64, sample count). Requires the wrapped model to expose cluster_centers_.

learning_curve ¶

learning_curve(*, cv: int = 5, scoring: Any = None, train_sizes: Any = None) -> pl.DataFrame

Learning curve: score per (train_size, fold, split).

Returns long-form rows — one per (train_size, fold, split). Each row carries the per-fold score plus the per-(train_size, split) aggregates mean_score, std_score, lower, upper (95% CI on the mean). Chart builders dedupe by (train_size, split) to render a ribbon + line; the per-fold rows enable per-fold strip overlays if a future caller wants them.

validation_curve ¶

validation_curve(param: str, values: Any, *, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Validation curve: score per (param_value, fold, split).

Same shape as learning_curve but parameterized by an estimator hyperparameter sweep. param is the kwarg name on the wrapped estimator (e.g. "alpha" for Ridge).

cv_scores ¶

cv_scores(*, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Per-fold cross-validation scores.

Returns one row per (fold, split) — train and test scores for each cross-validation fold. Chart builders use this for boxplot / bar / strip distributions across folds.

alpha_selection ¶

alpha_selection(alphas: Any, *, cv: int = 5, scoring: Any = None) -> pl.DataFrame

Regularization-strength sweep for linear models.

Returns one row per (alpha, fold) — the per-fold test score on the held-out split — plus per-alpha mean_score / std_score aggregates. Chart builders dedupe by alpha to render a single line, and use argmax(mean_score) to mark the best alpha.

importances ¶

importances(*, method: str = 'builtin', n_repeats: int = 30, scoring: Any = None, random_state: int | None = None) -> pl.DataFrame

Feature importance per feature, sorted by descending |importance|.

method="builtin" reads the wrapped model's feature_importances_ (tree-based estimators) or coef_ (linear estimators, averaged absolute value across classes for multi-output linears). std is zero in this path — sklearn's built-in attribute exposes no per-feature variance.

method="permutation" calls sklearn's permutation_importance with n_repeats/scoring and populates std with the per-feature standard deviation across repeats.

shap_values ¶

shap_values(*, background: Any = None, max_evals: int = 500) -> pl.DataFrame

Long-form SHAP values per (sample, feature, class).

Returns a DataFrame with sample_id, feature, shap_value, feature_value, feature_value_normalized, class_label.

Regression: class_label is the constant "target" on every row.
Binary classifiers: class_label is the positive-class name on every row; SHAP values are for the positive class.
Multi-class classifiers: one row per (sample, feature, class); class_label carries the class name. The result has n_samples * n_features * n_classes rows total.

Explainer is auto-picked by model capability:

coef_: shap.LinearExplainer (deterministic, fast).
feature_importances_: shap.TreeExplainer (deterministic for tree ensembles).
otherwise: shap.KernelExplainer (model-agnostic; uses the first min(50, N) rows of X as the background unless an explicit background array is passed).

partial_dependence ¶

partial_dependence(features: list[str | int], *, grid_resolution: int = 100, kind: str = 'average') -> pl.DataFrame

Partial dependence per feature.

kind="average" (default) returns the marginal PD curve per feature with sample_id = -1 (one row per grid point per feature).

kind="individual" returns per-sample ICE curves: one row per (feature, sample_id, grid_point) triple with sample_id in [0, n_samples). Chart builders pair this with the detail encoding channel on sample_id to render one polyline per sample.

kind="both" returns the union of the two: ICE rows plus average rows (sample_id = -1), so a downstream chart can overlay both layers on the same DataFrame.

roc_curve ¶

roc_curve(*, average: str | None = None, drop_intermediate: bool = True) -> pl.DataFrame

ROC curve(s). One row per (class, threshold). auc repeats per class.

For binary classifiers with average=None (default), returns a single curve on the positive (second) class. For multiclass, returns one-vs-rest curves per class; pass average in {"micro", "macro", "weighted"} to additionally include a summary curve under class="<average>".

pr_curve ¶

pr_curve(*, average: str | None = None) -> pl.DataFrame

Precision-recall curve(s). One row per (class, threshold).

For binary classifiers, returns a single curve on the positive (second) class — average is accepted for API symmetry with the multiclass path but has no effect because binary classifiers have only one curve to draw. For multiclass:

average=None (default) — returns one-vs-rest curves per class.
average in {"micro", "macro", "weighted"} — returns a single summary curve with class="<average>" and no per-class rows. Macro / weighted variants interpolate per- class precision over a shared recall grid (100 points); micro ravels the binarized labels into one curve. threshold is NaN on every row of macro / weighted summaries (recall-grid interpolation discards thresholds) and follows sklearn's padding convention for micro.

threshold is NaN at the final (recall=0) point of every per-class curve per sklearn's convention.

calibration_curve ¶

calibration_curve(*, n_bins: int = 10, strategy: str = 'uniform') -> pl.DataFrame

Calibration (reliability) curve for binary classifiers.

Returns one row per non-empty bin with mean_predicted, fraction_positive, and count. Delegates to the calibration_kernel Rust kernel.

cumulative_gain ¶

cumulative_gain() -> pl.DataFrame

Cumulative-gain curve per class. Appends a 2-row class='baseline' diagonal for plotting reference.

lift_curve ¶

lift_curve() -> pl.DataFrame

Lift curve per class. Appends a 2-row class='baseline' line at lift=1.0.

discrimination_threshold ¶

discrimination_threshold(*, n_thresholds: int = 50, cv: Any = None) -> pl.DataFrame

Discrimination threshold sweep — binary classifiers only.

Sweeps n_thresholds evenly-spaced thresholds in [0, 1] and reports precision, recall, F1, and queue_rate at each. queue_rate is the hand-computed fraction (y_score >= t).mean().

When cv is an int, runs the same sweep on each fold's held-out scores from a freshly-cloned + re-fit estimator and averages per-threshold metrics across folds. Pass a splitter object with a .split() method to override.

confusion_matrix ¶

confusion_matrix(*, normalize: str | None = None) -> pl.DataFrame

Confusion matrix in long form: one row per (actual, predicted) cell.

normalize: None for raw counts, "true"/"pred"/"all" for sklearn-style normalization. value is the (possibly normalized) count; value_fmt is a stringified label suitable for mark_text overlay (integer counts when unnormalized, two-decimal fractions when normalized).

predictions ¶

predictions() -> pl.DataFrame

Return y_true, y_pred, residual, studentized_residual, cooks_distance, leverage.

leverage is the diagonal of the hat matrix H = X (XᵀX)⁻¹ Xᵀ for linear estimators (those exposing coef_); NaN otherwise. Used by the residuals-vs-leverage panel of multi-panel residuals charts.

probabilities ¶

probabilities() -> pl.DataFrame

Return y_true + one column per class with predicted probability.

compare `classmethod` ¶

compare(models: dict[str, Any], X: Any, y: Any = None, **kwargs: Any) -> 'ComparedModelSource'

Build a ComparedModelSource over one ModelSource per model.

Each value in models is wrapped in its own ModelSource with the shared X and y. The returned ComparedModelSource proxies every derived-data method through all wrapped sources and stamps the model name as a model column on the concatenated output, so downstream chart builders can route color="model".

Parameters:

Name	Type	Description	Default
`models`	`dict[str, Any]`	Mapping from display name to fitted estimator. Each estimator is wrapped in its own `ModelSource` constructed with the shared `X`, `y`, and any additional `kwargs` (e.g. `random_state`, `feature_names`, `class_names`).	required
`X`	`array - like`	Feature matrix shared by all models. Accepted types match `ModelSource.__init__`.	required
`y`	`array - like`	Target shared by all models. Required by most derived-data methods (same constraints as `ModelSource`).	`None`
`**kwargs`	`Any`	Keyword arguments forwarded verbatim to each `ModelSource` constructor (e.g. `random_state`, `feature_names`, `class_names`, `sample_weight`).	`{}`

Returns:

Type	Description
`ComparedModelSource`	Multi-model wrapper whose derived-data methods return long-form DataFrames with an extra `model: Utf8` column.

Examples:

>>> import ferrum as fm
>>> from sklearn.linear_model import Ridge, Lasso
>>> cms = fm.ModelSource.compare(
...     {"ridge": Ridge().fit(X, y), "lasso": Lasso().fit(X, y)},
...     X, y, random_state=0,
... )
>>> fm.roc_chart(cms)          # overlay both ROC curves
>>> cms.model_names
['ridge', 'lasso']

Model Sources¶

ComparedModelSource ¶

model_names property ¶

ModelSource ¶

X property ¶

y property ¶

model property ¶

feature_names property ¶

capabilities property ¶

rank1d ¶

rank2d ¶

silhouette ¶

pca_variance ¶

embeddings ¶

intercluster_distance ¶

learning_curve ¶

validation_curve ¶

cv_scores ¶

alpha_selection ¶

importances ¶

shap_values ¶

partial_dependence ¶

roc_curve ¶

pr_curve ¶

calibration_curve ¶

cumulative_gain ¶

lift_curve ¶

discrimination_threshold ¶

confusion_matrix ¶

predictions ¶

probabilities ¶

compare classmethod ¶

model_names `property` ¶

X `property` ¶

y `property` ¶

model `property` ¶

feature_names `property` ¶

capabilities `property` ¶

compare `classmethod` ¶