Transforms¶

transform_filter, transform_aggregate, transform_calculate, and friends.

Ferrum — a statistical visualization library with a Rust core.

transform_aggregate ¶

transform_aggregate(*aggregates: dict, groupby: Sequence[str] | None = None) -> dict

Group-by aggregation (collapses rows).

Parameters:

Name	Type	Description	Default
`*aggregates`	`dict`	Aggregation specs, each a dict with keys `field`, `fn`, `as` (e.g. `{"field": "price", "fn": "mean", "as": "avg_price"}`).	`()`
`groupby`	`list of str`	Columns to group by.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_aggregate(
...     {"field": "price", "fn": "mean", "as": "avg_price"},
...     groupby=["category"],
... )
>>> t["type"]
'data_aggregate'

transform_bin ¶

transform_bin(field: str, *, as_: str | None = None, maxbins: int | None = None, step: float | None = None, nice: bool = True) -> dict

Bin a continuous field (adds a bin column without collapsing rows).

Parameters:

Name	Type	Description	Default
`field`	`str`	Column to bin.	required
`as_`	`str`	Output column name. Defaults to `"{field}_bin"`.	`None`
`maxbins`	`int`	Maximum number of bins.	`None`
`step`	`float`	Explicit bin width (overrides maxbins).	`None`
`nice`	`bool`	Whether to "nice" the bin boundaries.	`True`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_bin("horsepower", maxbins=10)
>>> t["type"]
'data_bin'
>>> t["field"]
'horsepower'

transform_calculate ¶

transform_calculate(as_: str, expr: str) -> dict

Add a derived column via an expression.

Parameters:

Name	Type	Description	Default
`as_`	`str`	Name of the output column.	required
`expr`	`str`	Expression string (e.g. `"datum.x * 2 + datum.y"`).	required

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_calculate("ratio", "datum.x / datum.y")
>>> t["type"]
'calculate'
>>> t["as_field"]
'ratio'

transform_density ¶

transform_density(field: str, *, bandwidth: float | None = None, groupby: Sequence[str] | None = None, extent: tuple[float, float] | None = None, steps: int | None = None, cumulative: bool = False, as_: tuple[str, str] = ('value', 'density')) -> dict

Kernel density estimation as a data transform.

Parameters:

Name	Type	Description	Default
`field`	`str`	Column to estimate density for.	required
`bandwidth`	`float`	KDE bandwidth. If None, estimated automatically.	`None`
`groupby`	`list of str`	Compute separate densities per group.	`None`
`extent`	`tuple of (float, float)`	Domain extent for the density grid.	`None`
`steps`	`int`	Number of grid steps.	`None`
`cumulative`	`bool`	If True, compute cumulative density.	`False`
`as_`	`tuple of (str, str)`	Output column names.	`("value", "density")`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_density("weight", bandwidth=0.5)
>>> t["type"]
'density_data'
>>> t["as_"]
['value', 'density']

transform_filter ¶

transform_filter(predicate: 'str | dict | Parameter') -> dict

Filter rows by a predicate expression or a reactive parameter.

Parameters:

Name	Type	Description	Default
`predicate`	`str, dict, or Parameter`	Vega-style expression string (e.g. `"datum.x > 5"`), a dict filter specification, or a Parameter (a selection or variable parameter). A `Parameter` predicate emits a pass-through `"true"` predicate plus a `param` marker: the static render keeps all rows while the WASM runtime crossfilters live against the linked parameter.	required

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_filter("datum.age >= 18")
>>> t["type"]
'filter'

>>> brush = fm.selection_interval(name="brush")
>>> fm.transform_filter(brush)
{'type': 'filter', 'predicate': 'true', 'param': 'brush'}

transform_flatten ¶

transform_flatten(fields: Sequence[str], *, as_: Sequence[str] | None = None) -> dict

Flatten list/array columns into separate rows.

Parameters:

Name	Type	Description	Default
`fields`	`list of str`	Column names to flatten.	required
`as_`	`list of str`	Output names for flattened columns.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_flatten(["tags"])
>>> t["type"]
'flatten'
>>> t["fields"]
['tags']

transform_fold ¶

transform_fold(fields: Sequence[str], *, as_: tuple[str, str] = ('key', 'value')) -> dict

Fold (melt) columns from wide to long format.

Parameters:

Name	Type	Description	Default
`fields`	`list of str`	Column names to fold.	required
`as_`	`tuple of (str, str)`	Output column names for the key and value columns.	`("key", "value")`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_fold(["col_a", "col_b"])
>>> t["type"]
'fold'
>>> t["as_"]
['key', 'value']

transform_impute ¶

transform_impute(field: str, *, method: str = 'value', value: float | None = None, groupby: Sequence[str] | None = None, key: str | None = None) -> dict

Impute missing values in a column.

Parameters:

Name	Type	Description	Default
`field`	`str`	Column to impute.	required
`method`	`str`	Imputation method: "value", "mean", "median", "min", "max".	`"value"`
`value`	`float`	Constant value for method="value".	`None`
`groupby`	`list of str`	Impute within groups.	`None`
`key`	`str`	Key column for sequence generation.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_impute("sales", method="mean")
>>> t["type"]
'impute'
>>> t["method"]
'mean'

transform_join_aggregate ¶

transform_join_aggregate(*aggregates: dict, groupby: Sequence[str] | None = None) -> dict

Add aggregate columns without collapsing rows (window-join pattern).

Parameters:

Name	Type	Description	Default
`*aggregates`	`dict`	Aggregation specs, each a dict with keys `field`, `fn`, `as` (e.g. `{"field": "price", "fn": "mean", "as": "avg_price"}`).	`()`
`groupby`	`list of str`	Columns to group by.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_join_aggregate(
...     {"field": "sales", "fn": "sum", "as": "total_sales"},
...     groupby=["region"],
... )
>>> t["type"]
'join_aggregate'

transform_loess ¶

transform_loess(x: str, y: str, *, bandwidth: float = 0.3, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict

LOESS/LOWESS smoothing as a data transform.

Parameters:

Name	Type	Description	Default
`x`	`str`	Independent variable column.	required
`y`	`str`	Dependent variable column.	required
`bandwidth`	`float`	Smoothing bandwidth (fraction of data).	`0.3`
`groupby`	`list of str`	Fit separate curves per group.	`None`
`as_`	`tuple of (str, str)`	Output column names.	`("x", "y")`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_loess("x", "y", bandwidth=0.5)
>>> t["type"]
'loess_data'
>>> t["bandwidth"]
0.5

transform_pivot ¶

transform_pivot(field: str, value: str, *, groupby: Sequence[str] | None = None, limit: int | None = None, op: str = 'sum') -> dict

Pivot from long to wide format.

Parameters:

Name	Type	Description	Default
`field`	`str`	Column whose unique values become new column headers.	required
`value`	`str`	Column whose values fill the pivoted cells.	required
`groupby`	`list of str`	Columns to group by.	`None`
`limit`	`int`	Maximum number of pivot columns to create.	`None`
`op`	`str`	Aggregation operation when multiple values map to the same cell.	`"sum"`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_pivot("category", "amount", groupby=["date"])
>>> t["type"]
'pivot'
>>> t["field"]
'category'

transform_regression ¶

transform_regression(x: str, y: str, *, method: str = 'linear', order: int = 1, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict

Regression fit as a data transform.

Parameters:

Name	Type	Description	Default
`x`	`str`	Independent variable column.	required
`y`	`str`	Dependent variable column.	required
`method`	`str`	Regression method (e.g. "linear", "poly", "exp", "log", "pow").	`"linear"`
`order`	`int`	Polynomial order (for method="poly").	`1`
`groupby`	`list of str`	Fit separate regressions per group.	`None`
`as_`	`tuple of (str, str)`	Output column names.	`("x", "y")`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_regression("x", "y", method="poly", order=2)
>>> t["type"]
'regression_data'
>>> t["method"]
'poly'

transform_sample ¶

transform_sample(n: int, *, seed: int = 42) -> dict

Random sample of rows.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to sample.	required
`seed`	`int`	RNG seed for deterministic sampling.	`42`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_sample(100, seed=7)
>>> t["type"]
'sample'
>>> t["n"]
100

transform_stack ¶

transform_stack(field: str, *, groupby: Sequence[str], sort: Sequence[str] | None = None, as_: tuple[str, str] = ('y0', 'y1'), offset: str = 'zero') -> dict

Compute stacked (cumulative) positions for bar/area charts.

Parameters:

Name	Type	Description	Default
`field`	`str`	Field to stack.	required
`groupby`	`list of str`	Columns defining each stack group.	required
`sort`	`list of str`	Sort order within each stack.	`None`
`as_`	`tuple of (str, str)`	Output column names for stack start and end.	`("y0", "y1")`
`offset`	`str`	Offset mode: "zero", "normalize", or "center".	`"zero"`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Raises:

Type	Description
`ValueError`	If `offset` is not one of `"zero"`, `"normalize"`, `"center"`.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_stack("sales", groupby=["region", "quarter"])
>>> t["type"]
'data_stack'
>>> t["as_"]
['y0', 'y1']

transform_timeunit ¶

transform_timeunit(field: str, unit: str, *, utc: bool = False, as_: str | None = None) -> dict

Extract a temporal unit from a datetime field.

Parameters:

Name	Type	Description	Default
`field`	`str`	Datetime column.	required
`unit`	`str`	Unit to extract: "year", "month", "day", "hour", "minute", "second", "day_of_week", "week", "quarter".	required
`utc`	`bool`	Whether to interpret timestamps as UTC.	`False`
`as_`	`str`	Output column name. Defaults to `"{unit}_{field}"`.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_timeunit("date", "month")
>>> t["type"]
'time_unit'
>>> t["unit"]
'month'

transform_top_k ¶

transform_top_k(n: int, *, field: str, op: str = 'sum', sort: str = 'descending') -> dict

Keep top-k groups by an aggregate value.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of top groups to keep.	required
`field`	`str`	Field to aggregate for ranking.	required
`op`	`str`	Aggregation operation: "sum", "mean", "count", "min", "max".	`"sum"`
`sort`	`str`	Sort direction: "descending" or "ascending".	`"descending"`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_top_k(5, field="revenue", op="sum")
>>> t["type"]
'top_k'
>>> t["n"]
5

transform_window ¶

transform_window(*window_transforms: dict, sort: Sequence[str] | None = None, groupby: Sequence[str] | None = None, frame: tuple[int | None, int | None] | None = None) -> dict

Window transform (ranking, lag/lead, rolling aggregates).

Parameters:

Name	Type	Description	Default
`*window_transforms`	`dict`	Window operation specs, each a dict with keys `op`, `as` and optionally `field` and `param` (e.g. `{"op": "row_number", "as": "rank"}`).	`()`
`sort`	`list of str`	Sort fields for the window.	`None`
`groupby`	`list of str`	Partition-by columns.	`None`
`frame`	`tuple of (int or None, int or None)`	Window frame bounds: (preceding, following). None means unbounded.	`None`

Returns:

Type	Description
`dict`	Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_window(
...     {"op": "row_number", "as": "rank"},
...     sort=["score"],
... )
>>> t["type"]
'data_window'