Skip to content

ferrum.transforms

Data transforms reshape input data before statistical transforms and rendering.

Data transforms — Python constructors for Phase 12 Rust data transforms.

Each function returns a plain dict matching the Rust TransformSpec serde wire format (#[serde(tag = "type", rename_all = "snake_case")]). The dict is passed through the transforms_json path at render time.

transform_filter

transform_filter(predicate: 'str | dict') -> dict

Filter rows by a predicate expression.

Parameters:

Name Type Description Default
predicate str or dict

Vega-style expression string (e.g. "datum.x > 5") or a dict filter specification.

required

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_filter("datum.age >= 18")
>>> t["type"]
'filter'

transform_calculate

transform_calculate(as_: str, expr: str) -> dict

Add a derived column via an expression.

Parameters:

Name Type Description Default
as_ str

Name of the output column.

required
expr str

Expression string (e.g. "datum.x * 2 + datum.y").

required

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_calculate("ratio", "datum.x / datum.y")
>>> t["type"]
'calculate'
>>> t["as_field"]
'ratio'

transform_aggregate

transform_aggregate(*aggregates: dict, groupby: Sequence[str] | None = None) -> dict

Group-by aggregation (collapses rows).

Parameters:

Name Type Description Default
*aggregates dict

Aggregation specs, each a dict with keys field, fn, as (e.g. {"field": "price", "fn": "mean", "as": "avg_price"}).

()
groupby list of str

Columns to group by.

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_aggregate(
...     {"field": "price", "fn": "mean", "as": "avg_price"},
...     groupby=["category"],
... )
>>> t["type"]
'data_aggregate'

transform_bin

transform_bin(field: str, *, as_: str | None = None, maxbins: int | None = None, step: float | None = None, nice: bool = True) -> dict

Bin a continuous field (adds a bin column without collapsing rows).

Parameters:

Name Type Description Default
field str

Column to bin.

required
as_ str

Output column name. Defaults to "{field}_bin".

None
maxbins int

Maximum number of bins.

None
step float

Explicit bin width (overrides maxbins).

None
nice bool

Whether to "nice" the bin boundaries.

True

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_bin("horsepower", maxbins=10)
>>> t["type"]
'data_bin'
>>> t["field"]
'horsepower'

transform_fold

transform_fold(fields: Sequence[str], *, as_: tuple[str, str] = ('key', 'value')) -> dict

Fold (melt) columns from wide to long format.

Parameters:

Name Type Description Default
fields list of str

Column names to fold.

required
as_ tuple of (str, str)

Output column names for the key and value columns.

("key", "value")

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_fold(["col_a", "col_b"])
>>> t["type"]
'fold'
>>> t["as_"]
['key', 'value']

transform_pivot

transform_pivot(field: str, value: str, *, groupby: Sequence[str] | None = None, limit: int | None = None, op: str = 'sum') -> dict

Pivot from long to wide format.

Parameters:

Name Type Description Default
field str

Column whose unique values become new column headers.

required
value str

Column whose values fill the pivoted cells.

required
groupby list of str

Columns to group by.

None
limit int

Maximum number of pivot columns to create.

None
op str

Aggregation operation when multiple values map to the same cell.

"sum"

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_pivot("category", "amount", groupby=["date"])
>>> t["type"]
'pivot'
>>> t["field"]
'category'

transform_join_aggregate

transform_join_aggregate(*aggregates: dict, groupby: Sequence[str] | None = None) -> dict

Add aggregate columns without collapsing rows (window-join pattern).

Parameters:

Name Type Description Default
*aggregates dict

Aggregation specs, each a dict with keys field, fn, as (e.g. {"field": "price", "fn": "mean", "as": "avg_price"}).

()
groupby list of str

Columns to group by.

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_join_aggregate(
...     {"field": "sales", "fn": "sum", "as": "total_sales"},
...     groupby=["region"],
... )
>>> t["type"]
'join_aggregate'

transform_window

transform_window(*window_transforms: dict, sort: Sequence[str] | None = None, groupby: Sequence[str] | None = None, frame: tuple[int | None, int | None] | None = None) -> dict

Window transform (ranking, lag/lead, rolling aggregates).

Parameters:

Name Type Description Default
*window_transforms dict

Window operation specs, each a dict with keys op, as and optionally field and param (e.g. {"op": "row_number", "as": "rank"}).

()
sort list of str

Sort fields for the window.

None
groupby list of str

Partition-by columns.

None
frame tuple of (int or None, int or None)

Window frame bounds: (preceding, following). None means unbounded.

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_window(
...     {"op": "row_number", "as": "rank"},
...     sort=["score"],
... )
>>> t["type"]
'data_window'

transform_density

transform_density(field: str, *, bandwidth: float | None = None, groupby: Sequence[str] | None = None, extent: tuple[float, float] | None = None, steps: int | None = None, cumulative: bool = False, as_: tuple[str, str] = ('value', 'density')) -> dict

Kernel density estimation as a data transform.

Parameters:

Name Type Description Default
field str

Column to estimate density for.

required
bandwidth float

KDE bandwidth. If None, estimated automatically.

None
groupby list of str

Compute separate densities per group.

None
extent tuple of (float, float)

Domain extent for the density grid.

None
steps int

Number of grid steps.

None
cumulative bool

If True, compute cumulative density.

False
as_ tuple of (str, str)

Output column names.

("value", "density")

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_density("weight", bandwidth=0.5)
>>> t["type"]
'density_data'
>>> t["as_"]
['value', 'density']

transform_regression

transform_regression(x: str, y: str, *, method: str = 'linear', order: int = 1, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict

Regression fit as a data transform.

Parameters:

Name Type Description Default
x str

Independent variable column.

required
y str

Dependent variable column.

required
method str

Regression method (e.g. "linear", "poly", "exp", "log", "pow").

"linear"
order int

Polynomial order (for method="poly").

1
groupby list of str

Fit separate regressions per group.

None
as_ tuple of (str, str)

Output column names.

("x", "y")

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_regression("x", "y", method="poly", order=2)
>>> t["type"]
'regression_data'
>>> t["method"]
'poly'

transform_loess

transform_loess(x: str, y: str, *, bandwidth: float = 0.3, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict

LOESS/LOWESS smoothing as a data transform.

Parameters:

Name Type Description Default
x str

Independent variable column.

required
y str

Dependent variable column.

required
bandwidth float

Smoothing bandwidth (fraction of data).

0.3
groupby list of str

Fit separate curves per group.

None
as_ tuple of (str, str)

Output column names.

("x", "y")

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_loess("x", "y", bandwidth=0.5)
>>> t["type"]
'loess_data'
>>> t["bandwidth"]
0.5

transform_impute

transform_impute(field: str, *, method: str = 'value', value: float | None = None, groupby: Sequence[str] | None = None, key: str | None = None) -> dict

Impute missing values in a column.

Parameters:

Name Type Description Default
field str

Column to impute.

required
method str

Imputation method: "value", "mean", "median", "min", "max".

"value"
value float

Constant value for method="value".

None
groupby list of str

Impute within groups.

None
key str

Key column for sequence generation.

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_impute("sales", method="mean")
>>> t["type"]
'impute'
>>> t["method"]
'mean'

transform_flatten

transform_flatten(fields: Sequence[str], *, as_: Sequence[str] | None = None) -> dict

Flatten list/array columns into separate rows.

Parameters:

Name Type Description Default
fields list of str

Column names to flatten.

required
as_ list of str

Output names for flattened columns.

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_flatten(["tags"])
>>> t["type"]
'flatten'
>>> t["fields"]
['tags']

transform_sample

transform_sample(n: int, *, seed: int = 42) -> dict

Random sample of rows.

Parameters:

Name Type Description Default
n int

Number of rows to sample.

required
seed int

RNG seed for deterministic sampling.

42

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_sample(100, seed=7)
>>> t["type"]
'sample'
>>> t["n"]
100

transform_top_k

transform_top_k(n: int, *, field: str, op: str = 'sum', sort: str = 'descending') -> dict

Keep top-k groups by an aggregate value.

Parameters:

Name Type Description Default
n int

Number of top groups to keep.

required
field str

Field to aggregate for ranking.

required
op str

Aggregation operation: "sum", "mean", "count", "min", "max".

"sum"
sort str

Sort direction: "descending" or "ascending".

"descending"

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_top_k(5, field="revenue", op="sum")
>>> t["type"]
'top_k'
>>> t["n"]
5

transform_stack

transform_stack(field: str, *, groupby: Sequence[str], sort: Sequence[str] | None = None, as_: tuple[str, str] = ('y0', 'y1'), offset: str = 'zero') -> dict

Compute stacked (cumulative) positions for bar/area charts.

Parameters:

Name Type Description Default
field str

Field to stack.

required
groupby list of str

Columns defining each stack group.

required
sort list of str

Sort order within each stack.

None
as_ tuple of (str, str)

Output column names for stack start and end.

("y0", "y1")
offset str

Offset mode: "zero", "normalize", or "center".

"zero"

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_stack("sales", groupby=["region", "quarter"])
>>> t["type"]
'data_stack'
>>> t["as_"]
['y0', 'y1']

transform_timeunit

transform_timeunit(field: str, unit: str, *, utc: bool = False, as_: str | None = None) -> dict

Extract a temporal unit from a datetime field.

Parameters:

Name Type Description Default
field str

Datetime column.

required
unit str

Unit to extract: "year", "month", "day", "hour", "minute", "second", "day_of_week", "week", "quarter".

required
utc bool

Whether to interpret timestamps as UTC.

False
as_ str

Output column name. Defaults to "{unit}_{field}".

None

Returns:

Type Description
dict

Transform specification for the Rust engine.

Examples:

>>> import ferrum as fm
>>> t = fm.transform_timeunit("date", "month")
>>> t["type"]
'time_unit'
>>> t["unit"]
'month'