LGMFormula.jl

Tier-2 formula sugar for LatentGaussianModels.jl. Exposes a single macro @lgm and its function form lgmformula that lower a formula expression bound to a Tables.jl-compatible source into an explicit LatentGaussianModel(...) constructor call.

The macro is strictly source-to-source per ADR-008 and plans/macro-policy.md. Every expansion produces a constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.

When to use this

Migrating models written against R-INLA's inla(formula, …) API.
Wanting concise notation for standard R-INLA-style models with named index columns.

For a guided introduction, see the migration guide.

Quick example

using GMRFs, LatentGaussianModels, LGMFormula

df = (y = y, x = x, region = collect(1:n))
model = @lgm y ~ 1 + x + f(region, IID(n)) data=df family=PoissonLikelihood()
res = inla(model, df.y)

Status

v0.2.0. Phase N PRs 1–6 closed:

PR-1: core parser + single-likelihood single-f expansion.
PR-2: component coverage roundtrips.
PR-3: multi-f roundtrip coverage.
PR-4: tuple-LHS multi-likelihood (wide-format).
PR-5: replicate / group term routing.
PR-6: migration guide + vignette parity (this page).

PR-4b (Copy augmentation) and PR-7 (SPDE-friendly coordinate forms) ship in a follow-up.

API

LGMFormula.LGMFormula — Module

LGMFormula

Tier-2 formula sugar (@lgm) for LatentGaussianModels.jl.

The macro is a source-to-source transform: every expansion of @lgm(...) produces a LatentGaussianModel(...) constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.

Public API

@lgm — formula macro.
lgmformula — function form; the macro lowers to this.

See plans/macro-policy.md and ADR-008.

source

LGMFormula._build_design_matrix — Method

_build_design_matrix(data, lhs, has_intercept, covariates, randoms)
    -> SparseMatrixCSC{Float64, Int}

Assemble the linear projector matrix A for the formula. Columns are ordered as [intercept | covariates | random-effect indicators...].

Each random-effect term contributes one block:

Plain f(col, Comp): length(comp) columns; row i is 1 in column idx[i] (within the block). Input column must contain integers in 1:length(comp).
f(col, Comp; replicate = id_col): R · length(comp) columns, laid out [x⁽¹⁾; x⁽²⁾; …; x⁽ᴿ⁾]. Row i is 1 in column (id[i] - 1) · length(comp) + col[i].
f(col, Factory; group = grp_col): Σ_g s_g columns where s_g is the per-group size (number of rows with grp == g). Row i is 1 in column offset[grp[i]] + col[i]. col[i] is required to lie in 1:s_{grp[i]}.

source

LGMFormula._build_expansion — Method

_build_expansion(lhs::Vector{Symbol}, has_intercept, covariates,
                 randoms, data_expr, family_expr) -> Expr

Return the Expr that @lgm expands to. Module references use absolute interpolation ($LatentGaussianModels.Intercept(), $LGMFormula._build_design_matrix(...)) so the expansion resolves regardless of the caller's using imports.

Single-LHS expands to a LinearProjector-equivalent SparseMatrixCSC (LGM auto-wraps). Multi-LHS expands to a StackedMapping with one LinearProjector(A) block per likelihood, sharing the RHS-built A.

PR-5: f(col, Comp; replicate = id_col) and f(col, Factory; group = grp_col) lower to _wrap_term(...) runtime calls in the components tuple — the wrapper resolves id_col/grp_col against data to construct Replicate(comp, R) / Group(factory, grp). The macro itself does no I/O; the AST is still data-free.

source

LGMFormula._build_multi_likelihood_mapping — Method

_build_multi_likelihood_mapping(data, lhs, has_intercept, covariates, randoms)
    -> StackedMapping

Build a row-partitioned StackedMapping for tuple-LHS multi-likelihood models. The shared RHS produces a single sparse A; each likelihood block wraps the same LinearProjector(A). Observation rows are partitioned contiguously: block k owns rows ((k-1)·n + 1):(k·n).

All columns in lhs must have equal length n (wide-format only — long-format with a type column is left for a follow-up).

source

LGMFormula._build_spatial_block — Method

_build_spatial_block(component, data_cols, coord_cols::Tuple, n_obs::Int)
    -> SparseMatrixCSC{Float64, Int}

Build the design-matrix block for a tuple-coordinate f((cols...), component) term. The default method throws — concrete implementations live in package extensions. LGMFormulaINLASPDEExt overloads this for SPDE2 to build a barycentric MeshProjector.

source

LGMFormula._check_columns — Method

_check_columns(data, lhs, covariates, randoms)

Validate that data is a Tables.jl source and that every referenced column is present. Errors refer to user-visible names, not table internals.

source

LGMFormula._flatten_plus — Method

_flatten_plus(expr) -> Vector{Any}

Flatten an a + b + c chain into [a, b, c]. Anything else returns as a single-element list.

source

LGMFormula._parse_args — Method

_parse_args(args) -> (formula_expr, opts::Dict{Symbol,Any})

Split macro args into the formula expression and key = value options. Accepts both bare-form (@lgm y ~ 1 data=df) and parenthesised-form (@lgm(y ~ 1, data=df)).

source

LGMFormula._parse_f_term — Method

_parse_f_term(s::Expr) -> NamedTuple

Pull the column symbol, component expression, and replicate/group keyword arguments out of an f(...) call. Accepts both f(col, comp; replicate=id) (semicolon-style) and f(col, comp, replicate=id) (trailing-kw style); rejects unsupported keywords with a user-visible error.

The first positional argument is either a bare column name (Symbol) for index-typed f(col, Comp) random effects, or a tuple of column names (s_col, t_col) (length 2 for spatial SPDE) or (s_col, t_col, time_col) (length 3, reserved for KroneckerComponent in PR-7c).

source

LGMFormula._parse_formula — Method

_parse_formula(expr) -> (lhs::Vector{Symbol}, rhs)

Split lhs ~ rhs into LHS column names and RHS expression. The LHS is always returned as a Vector{Symbol} (length 1 for single-likelihood, length k > 1 for multi-likelihood tuple-LHS (y1, y2, ...) ~ rhs, ADR-033).

source

LGMFormula._split_rhs — Method

_split_rhs(rhs) -> (has_intercept::Bool,
                   covariates::Vector{Symbol},
                   randoms::Vector{<:NamedTuple})

Walk the RHS, splitting at +. Each summand is one of:

1 — explicit intercept marker (default if no marker present).
0 or -1 — explicit "no intercept".
bare Symbol — fixed-effects covariate column.
f(col, Component(...)) — random-effect term; col is a column symbol, the second arg is the (un-evaluated) component expression.
f(col, Component(...); replicate = id_col) — R-INLA-style replicated component (PR-5). The macro emits a runtime call that wraps the inner component as Replicate(comp, R) where R = maximum(id_col).
f(col, Component; group = grp_col) — R-INLA-style grouped component (PR-5, factory form). The second positional argument is the factory (a Symbol or callable, not an instance); the macro emits a runtime Group(factory, grp_col) wrap.

Each f(...) term lowers to a NamedTuple{(:col, :comp_expr, :replicate, :group)} where replicate / group carry the keyword- argument column symbols (or nothing).

Other forms (transformations, interactions, etc.) raise an error referring to user concepts.

source

LGMFormula._wrap_term — Method

_wrap_term(comp_or_factory, data, replicate_col, group_col) ->
    AbstractLatentComponent

PR-5 runtime helper. Wraps an f(col, Comp; replicate = id) or f(col, Factory; group = grp) term against the actual data table:

replicate_col::Symbol: returns Replicate(comp, R) where R = maximum(data.$replicate_col). comp_or_factory must be an AbstractLatentComponent instance.
group_col::Symbol: returns Group(factory, data.$group_col). comp_or_factory is the factory (e.g. IID, AR1); the per-group inner components are constructed by the LGM core Group(factory, group_id) constructor — see packages/LatentGaussianModels.jl/src/components/group.jl.
both nothing: returns comp_or_factory unchanged (must be a component instance — caller's responsibility).

source

LGMFormula.lgmformula — Method

lgmformula(data; lhs, intercept = true, covariates = Symbol[],
           randoms = [], family) -> LatentGaussianModel

Function form of @lgm. Accepts a structured description of the formula and returns the same LatentGaussianModel the macro would produce.

Arguments

data: a Tables.jl-compatible source.
lhs::Union{Symbol, AbstractVector{Symbol}}: outcome column name(s). A vector triggers tuple-LHS multi-likelihood; family must then be a tuple of likelihoods of matching length.
intercept::Bool = true: whether to include Intercept().
covariates::Vector{Symbol} = Symbol[]: scalar fixed-effect column names. Becomes FixedEffects(length(covariates)) if non-empty.
randoms::AbstractVector = []: list of f-term specifications. Each entry may be:
- (col::Symbol, comp::AbstractLatentComponent) — plain f-term.
- (col, comp, replicate::Symbol, group::Nothing) — replicated component; runtime wraps as Replicate(comp, R).
- (col, factory, replicate::Nothing, group::Symbol) — grouped component; runtime wraps as Group(factory, grp_col_values).
- A NamedTuple{(:col, :comp_expr, :replicate, :group)} — internal form emitted by the macro.
family: observation likelihood (single-LHS) or tuple of likelihoods (multi-LHS).

source

LGMFormula.@lgm — Macro

@lgm formula data=df family=Likelihood()

Build a LatentGaussianModel from a formula expression bound to a Tables.jl-compatible source.

Supported

@lgm y ~ 1 data=df family=GaussianLikelihood() — intercept only.
@lgm y ~ 1 + x1 + x2 data=df family=GaussianLikelihood() — intercept + scalar covariates.
@lgm y ~ 0 + x data=df family=GaussianLikelihood() — no intercept (-1 also accepted).
@lgm y ~ 1 + f(idx, IID(n)) + f(t, RW1(T)) data=df family=PoissonLikelihood() — intercept + multiple random effects.
@lgm (y1, y2) ~ 1 + f(idx, IID(n)) data=df family=(GaussianLikelihood(), PoissonLikelihood()) — multi-likelihood tuple-LHS with shared RHS (wide-format).
@lgm y ~ 1 + f(t, AR1(n); replicate = id) data=df family=GaussianLikelihood() — replicated component (R-INLA's replicate=id); runtime wraps as Replicate(comp, R) with R = maximum(data.id).
@lgm y ~ 1 + f(t, AR1; group = grp) data=df family=GaussianLikelihood() — grouped component (R-INLA's group=grp + factory form); runtime wraps as Group(factory, data.grp) with one inner component per group label.

Restrictions

Fixed-effects terms must be bare column symbols. Transformations (log(x), x1*x2, factor expansions) are not yet supported.
The col of an f(col, Component) term must be a column of integers in 1:length(Component).
Tuple-LHS columns must all have the same length (wide-format only; long-format with a type column is left for a follow-up).
replicate and group are mutually exclusive within a single f(...) term.
Copy(...) augmentation (f(...; copy = :name)) ships in PR-4b.

Expansion

The macro expands to an explicit LatentGaussianModel(...) call with a lgmformula-built design matrix; run @macroexpand to inspect. The components tuple and likelihood appear literally in the expansion; only the design matrix construction is deferred to runtime (it depends on the data).

source