Skip to content

Marks & encodings

Two primitives carry every Ferrum chart: marks (the geometric shapes that visualize your data) and encodings (the typed mappings from data fields to visual variables). Picking the right mark + encoding combination is most of what authoring a chart looks like.

This page is the reference for both. It covers the encoding channels, the mark families, the shorthand syntax that compresses common cases, and when to reach for what.

How a chart is assembled

A Ferrum chart is built by attaching a mark to a data source and declaring which columns drive which visual variables. Every chart follows the same shape:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_point()
    .encode(x="sepal_length", y="petal_length", color="species:N")
)
assert chart.show_svg().startswith("<svg")

Basic scatter

The three pieces — data, mark, encoding — compose freely. You can change the mark without touching the encoding (mark_line() instead of mark_point()), change the encoding without touching the mark, or compose multiple marks against the same encoding (see Composition).

Encoding channels

An encoding channel declares: this field drives this visual variable. Channels are typed by the engine: a quantitative field gets a continuous scale, a nominal field gets a categorical color palette, a temporal field gets a time scale. You can be explicit by passing an encoding object (fm.X("col", type="Q")) or use the shorthand syntax (described below).

Positional channels

These channels place marks in space:

Channel Purpose
x, y Primary horizontal / vertical position.
x2, y2 Secondary position. Used for bands, segments, intervals, error extents.
xerror, yerror, xerror2, yerror2 Error extents around the primary position.
theta, radius Polar coordinates. Used with CoordPolar.

Most charts only declare x and y. The rest unlock band marks (mark_area, mark_errorband), intervals (mark_rect, mark_rule), and polar plots.

Appearance channels

These channels modulate how marks look:

Channel Purpose
color Mark color. Continuous fields get a perceptually uniform palette; categorical fields get a discrete palette.
fill, stroke Override color separately for the fill and stroke. color sets both.
opacity, fill_opacity, stroke_opacity Mark opacity.
stroke_width, stroke_dash Stroke styling.
size Mark size.
shape Mark glyph (for mark_point).
angle Rotation.

Appearance channels can take either a field name (data-driven) or a literal value (constant for all marks). Setting color="red" colors every mark red; setting color="species:N" colors marks by the species column.

Text and metadata channels

These channels carry information that does not directly map to position or appearance:

Channel Purpose
text Text content for mark_text.
detail Additional grouping that does not affect appearance — useful for keeping series separate without coloring them differently.
tooltip, tooltip_field Field shown on hover. In interactive mode, renders as a tooltip overlay; in static output, becomes accessibility metadata.
href URL the mark links to.
description Accessibility description.
key Stable identity for interactive selections.

Faceting channels

These channels split the chart into small multiples:

Channel Purpose
facet Single faceting variable, wrapped into a grid.
facet_row, facet_col Row / column facets for a 2-D small-multiples grid.

Faceting is structural: it produces multiple panels rather than overlaying marks. To layer marks against the same axes, use Composition.

The shorthand string syntax

Encodings accept a compact string syntax that handles the most common cases without explicit channel objects:

Shorthand Meaning
"field" Field with inferred type (engine picks Q / N / O / T based on dtype).
"field:Q" Explicitly quantitative.
"field:N" Nominal (unordered categorical).
"field:O" Ordinal (ordered categorical).
"field:T" Temporal.
"agg(field):Q" Aggregation. Examples: "mean(price):Q", "count():Q", "sum(qty):Q", "median(value):Q".

The shorthand is purely syntactic sugar over the explicit form. fm.X("price", type="Q") and "price:Q" produce identical specs. The shorthand keeps simple cases compact; the explicit form unlocks advanced channel options.

When in doubt, use the explicit form:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_point()
    .encode(
        x=fm.X("sepal_length", type="Q", title="Sepal length"),
        y=fm.Y("petal_length", type="Q", title="Petal length"),
        color=fm.Color("species", type="N", title="Species"),
    )
)
assert chart.show_svg().startswith("<svg")

Explicit encoding

Mark families

Ferrum ships 54 mark methods on Chart. They group into families by what they're for.

Primitive marks

The geometric building blocks. Use these when you want direct control over what gets drawn.

Method Geometry
mark_point() Discrete points. The default scatter mark.
mark_line() Polyline connecting points in order.
mark_bar() Vertical or horizontal bars.
mark_area() Filled area, optionally banded with y2.
mark_rule() Reference lines (often horizontal or vertical).
mark_tick() Short ticks, often used for rug plots.
mark_rect() Rectangular cells. Used for heatmaps and intervals.
mark_text() Text labels (paired with the text encoding).
mark_label() Positioned text labels with collision avoidance (avoid_overlap=True).
mark_image() Image tiles from URL fields.

Example — basic scatter:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_point()
    .encode(
        x="sepal_length",
        y="petal_length",
        color="species:N",
        size="sepal_width",
    )
)
assert chart.show_svg().startswith("<svg")

Scatter with size

Statistical marks

These marks compute a transform on your data before rendering — KDE, binning, smoothing, contours, quantile-quantile reference, or arbitrary functions. The transform happens in Rust, declared in the chart spec.

Method Transform
mark_density() 1-D kernel density estimate.
mark_histogram() Binned counts or densities.
mark_smooth() LOESS / GLM / logistic regression overlay.
mark_contour() 2-D density contours.
mark_qq() Quantile-quantile plot against a reference distribution.
mark_violin() Symmetric KDE per group.
mark_function() Plot an arbitrary f(x) over a domain.

Example — 1-D kernel density estimate:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_density(bandwidth="scott")
    .encode(x="sepal_length")
)
assert chart.show_svg().startswith("<svg")

KDE density

Stat marks are described in detail in Stats in the rendering pipeline.

Grouped transforms with groupby

Statistical marks compute their transform over the entire dataset by default. To compute independently per group — one LOESS line per species, one KDE per category — pass groupby=:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_smooth(method="loess", groupby="species")
    .encode(x="sepal_length", y="petal_length", color="species:N")
)
assert chart.show_svg().startswith("<svg")

Grouped LOESS

The groupby parameter is available on mark_smooth, mark_density, and mark_histogram. The group column is preserved in the transform output so downstream color= encoding maps each group to a distinct visual.

This is especially important when layering a statistical mark with a scatter via + — without groupby, the transform runs over all data combined and produces a single aggregate line.

Distribution-summary marks

For comparing categorical distributions at a glance:

Method Geometry
mark_boxplot() Tukey boxplot — quartiles, whiskers, outliers.
mark_boxen() Letter-value boxplot — more quantiles for larger samples.
mark_swarm() Beeswarm jitter (categorical scatter without overlap).

Example — boxplot by species:

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
chart = (
    fm.Chart(iris)
    .mark_boxplot()
    .encode(x="species:N", y="sepal_length")
)
assert chart.show_svg().startswith("<svg")

Boxplot by species

Uncertainty marks

For showing the spread around an estimate:

Method Geometry
mark_errorbar() Error bars with optional terminal ticks.
mark_errorband() Filled band between y and y2.
mark_ribbon() Continuous band, typically over a line.

Scale-aware marks

For large data where vector marks would overwhelm the renderer:

Method Geometry
mark_raster() Pre-aggregated rectangular grid.
mark_hex() Hexagonal binning.

Both fall back to explicit raster output regardless of the renderer in use. See Performance & scale for when to reach for them.

Model-diagnostic marks

Phase 10 introduced a family of model-evaluation marks that work with ModelSource to produce ROC curves, calibration plots, residuals, SHAP summaries, and similar diagnostics — see the Model diagnostics page. These marks live alongside the rest because, structurally, a ROC curve is a chart. The full list (selected): mark_residuals, mark_prediction_error, mark_roc, mark_pr, mark_calibration, mark_gain, mark_lift, mark_confusion, mark_importance, mark_shap_beeswarm, mark_shap_bar, mark_pdp, mark_learning_curve.

For most diagnostic use cases you should reach for the figure-level helpers (roc_chart, calibration_chart, etc.) covered in Figure-level helpers rather than calling the marks directly. The marks exist for cases where you want to drop into the grammar and compose custom diagnostic views.

Picking a mark

A quick decision guide for the common cases:

  • One variable, looking at distribution shape? mark_density() or mark_histogram().
  • One variable across groups? mark_boxplot() (or mark_violin() for symmetric KDEs, mark_swarm() for full points).
  • Two variables, looking at relationship? mark_point() (low cardinality), mark_hex() (high cardinality), mark_smooth() overlaid on points (with a regression line).
  • Two variables over time? mark_line(), optionally with mark_ribbon() or mark_errorband() for uncertainty.
  • Counts by category? mark_bar(). Add encode(color="...") and a stacking position adjustment for stacked bars.
  • A discrete grid of values? mark_rect(). The same primitive serves heatmaps and binned 2-D histograms.
  • Model diagnostic? Use the figure-level helpers in Figure-level helpers and Model diagnostics rather than calling diagnostic marks directly.

A complete example

Combining multiple marks against one data source with shared encodings (full composition is covered on the Composition page):

import ferrum as fm
import polars as pl
from sklearn.datasets import load_iris

raw = load_iris()
iris = pl.DataFrame(raw.data, schema=["sepal_length", "sepal_width", "petal_length", "petal_width"]).with_columns(
    species=pl.Series([raw.target_names[t] for t in raw.target])
)
points = (
    fm.Chart(iris)
    .mark_point(opacity=0.6)
    .encode(x="sepal_length", y="petal_length", color="species:N")
)
trend = (
    fm.Chart(iris)
    .mark_smooth(method="loess")
    .encode(x="sepal_length", y="petal_length", color="species:N")
)
combined = points + trend
assert combined.show_svg().startswith("<svg")

Points + LOESS trend

This puts a per-species LOESS overlay on top of a scatter. Same encoding, two marks, one layered chart — the + operator on Chart produces a layered view that renders both marks against the same axes.

Where to go next

  • Composition for how to combine multiple marks and charts into compound views (Layer, HConcat, VConcat, JointChart, etc.).
  • Themes for changing how marks look without changing the chart spec.
  • Figure-level helpers for one-line entry points to common chart patterns.
  • Stats in the rendering pipeline for the design rationale behind statistical marks.
  • The API Reference for the full method signatures of every mark and encoding channel.