ferrum.transforms¶
Data transforms reshape input data before statistical transforms and rendering.
Data transforms — Python constructors for Phase 12 Rust data transforms.
Each function returns a plain dict matching the Rust TransformSpec serde
wire format (#[serde(tag = "type", rename_all = "snake_case")]). The dict
is passed through the transforms_json path at render time.
transform_filter ¶
transform_calculate ¶
transform_aggregate ¶
Group-by aggregation (collapses rows).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*aggregates
|
dict
|
Aggregation specs, each a dict with keys |
()
|
groupby
|
list of str
|
Columns to group by. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_bin ¶
transform_bin(field: str, *, as_: str | None = None, maxbins: int | None = None, step: float | None = None, nice: bool = True) -> dict
Bin a continuous field (adds a bin column without collapsing rows).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Column to bin. |
required |
as_
|
str
|
Output column name. Defaults to |
None
|
maxbins
|
int
|
Maximum number of bins. |
None
|
step
|
float
|
Explicit bin width (overrides maxbins). |
None
|
nice
|
bool
|
Whether to "nice" the bin boundaries. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_fold ¶
Fold (melt) columns from wide to long format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
list of str
|
Column names to fold. |
required |
as_
|
tuple of (str, str)
|
Output column names for the key and value columns. |
("key", "value")
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_pivot ¶
transform_pivot(field: str, value: str, *, groupby: Sequence[str] | None = None, limit: int | None = None, op: str = 'sum') -> dict
Pivot from long to wide format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Column whose unique values become new column headers. |
required |
value
|
str
|
Column whose values fill the pivoted cells. |
required |
groupby
|
list of str
|
Columns to group by. |
None
|
limit
|
int
|
Maximum number of pivot columns to create. |
None
|
op
|
str
|
Aggregation operation when multiple values map to the same cell. |
"sum"
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_join_aggregate ¶
Add aggregate columns without collapsing rows (window-join pattern).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*aggregates
|
dict
|
Aggregation specs, each a dict with keys |
()
|
groupby
|
list of str
|
Columns to group by. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_window ¶
transform_window(*window_transforms: dict, sort: Sequence[str] | None = None, groupby: Sequence[str] | None = None, frame: tuple[int | None, int | None] | None = None) -> dict
Window transform (ranking, lag/lead, rolling aggregates).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*window_transforms
|
dict
|
Window operation specs, each a dict with keys |
()
|
sort
|
list of str
|
Sort fields for the window. |
None
|
groupby
|
list of str
|
Partition-by columns. |
None
|
frame
|
tuple of (int or None, int or None)
|
Window frame bounds: (preceding, following). None means unbounded. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_density ¶
transform_density(field: str, *, bandwidth: float | None = None, groupby: Sequence[str] | None = None, extent: tuple[float, float] | None = None, steps: int | None = None, cumulative: bool = False, as_: tuple[str, str] = ('value', 'density')) -> dict
Kernel density estimation as a data transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Column to estimate density for. |
required |
bandwidth
|
float
|
KDE bandwidth. If None, estimated automatically. |
None
|
groupby
|
list of str
|
Compute separate densities per group. |
None
|
extent
|
tuple of (float, float)
|
Domain extent for the density grid. |
None
|
steps
|
int
|
Number of grid steps. |
None
|
cumulative
|
bool
|
If True, compute cumulative density. |
False
|
as_
|
tuple of (str, str)
|
Output column names. |
("value", "density")
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_regression ¶
transform_regression(x: str, y: str, *, method: str = 'linear', order: int = 1, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict
Regression fit as a data transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
str
|
Independent variable column. |
required |
y
|
str
|
Dependent variable column. |
required |
method
|
str
|
Regression method (e.g. "linear", "poly", "exp", "log", "pow"). |
"linear"
|
order
|
int
|
Polynomial order (for method="poly"). |
1
|
groupby
|
list of str
|
Fit separate regressions per group. |
None
|
as_
|
tuple of (str, str)
|
Output column names. |
("x", "y")
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_loess ¶
transform_loess(x: str, y: str, *, bandwidth: float = 0.3, groupby: Sequence[str] | None = None, as_: tuple[str, str] = ('x', 'y')) -> dict
LOESS/LOWESS smoothing as a data transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
str
|
Independent variable column. |
required |
y
|
str
|
Dependent variable column. |
required |
bandwidth
|
float
|
Smoothing bandwidth (fraction of data). |
0.3
|
groupby
|
list of str
|
Fit separate curves per group. |
None
|
as_
|
tuple of (str, str)
|
Output column names. |
("x", "y")
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_impute ¶
transform_impute(field: str, *, method: str = 'value', value: float | None = None, groupby: Sequence[str] | None = None, key: str | None = None) -> dict
Impute missing values in a column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Column to impute. |
required |
method
|
str
|
Imputation method: "value", "mean", "median", "min", "max". |
"value"
|
value
|
float
|
Constant value for method="value". |
None
|
groupby
|
list of str
|
Impute within groups. |
None
|
key
|
str
|
Key column for sequence generation. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_flatten ¶
Flatten list/array columns into separate rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
list of str
|
Column names to flatten. |
required |
as_
|
list of str
|
Output names for flattened columns. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_sample ¶
transform_top_k ¶
Keep top-k groups by an aggregate value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of top groups to keep. |
required |
field
|
str
|
Field to aggregate for ranking. |
required |
op
|
str
|
Aggregation operation: "sum", "mean", "count", "min", "max". |
"sum"
|
sort
|
str
|
Sort direction: "descending" or "ascending". |
"descending"
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_stack ¶
transform_stack(field: str, *, groupby: Sequence[str], sort: Sequence[str] | None = None, as_: tuple[str, str] = ('y0', 'y1'), offset: str = 'zero') -> dict
Compute stacked (cumulative) positions for bar/area charts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Field to stack. |
required |
groupby
|
list of str
|
Columns defining each stack group. |
required |
sort
|
list of str
|
Sort order within each stack. |
None
|
as_
|
tuple of (str, str)
|
Output column names for stack start and end. |
("y0", "y1")
|
offset
|
str
|
Offset mode: "zero", "normalize", or "center". |
"zero"
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples:
transform_timeunit ¶
Extract a temporal unit from a datetime field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Datetime column. |
required |
unit
|
str
|
Unit to extract: "year", "month", "day", "hour", "minute", "second", "day_of_week", "week", "quarter". |
required |
utc
|
bool
|
Whether to interpret timestamps as UTC. |
False
|
as_
|
str
|
Output column name. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Transform specification for the Rust engine. |
Examples: