LGMFormula.jl
Tier-2 formula sugar for LatentGaussianModels.jl. Exposes a single macro @lgm and its function form lgmformula that lower a formula expression bound to a Tables.jl-compatible source into an explicit LatentGaussianModel(...) constructor call.
The macro is strictly source-to-source per ADR-008 and plans/macro-policy.md. Every expansion produces a constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.
When to use this
- Migrating models written against R-INLA's
inla(formula, …)API. - Wanting concise notation for standard R-INLA-style models with named index columns.
For a guided introduction, see the migration guide.
Quick example
using GMRFs, LatentGaussianModels, LGMFormula
df = (y = y, x = x, region = collect(1:n))
model = @lgm y ~ 1 + x + f(region, IID(n)) data=df family=PoissonLikelihood()
res = inla(model, df.y)Status
v0.2.0. Phase N PRs 1–6 closed:
- PR-1: core parser + single-likelihood single-
fexpansion. - PR-2: component coverage roundtrips.
- PR-3: multi-
froundtrip coverage. - PR-4: tuple-LHS multi-likelihood (wide-format).
- PR-5:
replicate/groupterm routing. - PR-6: migration guide + vignette parity (this page).
PR-4b (Copy augmentation) and PR-7 (SPDE-friendly coordinate forms) ship in a follow-up.
API
LGMFormula.LGMFormula — Module
LGMFormulaTier-2 formula sugar (@lgm) for LatentGaussianModels.jl.
The macro is a source-to-source transform: every expansion of @lgm(...) produces a LatentGaussianModel(...) constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.
Public API
@lgm— formula macro.lgmformula— function form; the macro lowers to this.
See plans/macro-policy.md and ADR-008.
LGMFormula._build_design_matrix — Method
_build_design_matrix(data, lhs, has_intercept, covariates, randoms)
-> SparseMatrixCSC{Float64, Int}Assemble the linear projector matrix A for the formula. Columns are ordered as [intercept | covariates | random-effect indicators...].
Each random-effect term contributes one block:
- Plain
f(col, Comp):length(comp)columns; row i is 1 in columnidx[i](within the block). Input column must contain integers in1:length(comp). f(col, Comp; replicate = id_col):R · length(comp)columns, laid out[x⁽¹⁾; x⁽²⁾; …; x⁽ᴿ⁾]. Row i is 1 in column(id[i] - 1) · length(comp) + col[i].f(col, Factory; group = grp_col):Σ_g s_gcolumns wheres_gis the per-group size (number of rows withgrp == g). Row i is 1 in columnoffset[grp[i]] + col[i].col[i]is required to lie in1:s_{grp[i]}.
LGMFormula._build_expansion — Method
_build_expansion(lhs::Vector{Symbol}, has_intercept, covariates,
randoms, data_expr, family_expr) -> ExprReturn the Expr that @lgm expands to. Module references use absolute interpolation ($LatentGaussianModels.Intercept(), $LGMFormula._build_design_matrix(...)) so the expansion resolves regardless of the caller's using imports.
Single-LHS expands to a LinearProjector-equivalent SparseMatrixCSC (LGM auto-wraps). Multi-LHS expands to a StackedMapping with one LinearProjector(A) block per likelihood, sharing the RHS-built A.
PR-5: f(col, Comp; replicate = id_col) and f(col, Factory; group = grp_col) lower to _wrap_term(...) runtime calls in the components tuple — the wrapper resolves id_col/grp_col against data to construct Replicate(comp, R) / Group(factory, grp). The macro itself does no I/O; the AST is still data-free.
LGMFormula._build_multi_likelihood_mapping — Method
_build_multi_likelihood_mapping(data, lhs, has_intercept, covariates, randoms)
-> StackedMappingBuild a row-partitioned StackedMapping for tuple-LHS multi-likelihood models. The shared RHS produces a single sparse A; each likelihood block wraps the same LinearProjector(A). Observation rows are partitioned contiguously: block k owns rows ((k-1)·n + 1):(k·n).
All columns in lhs must have equal length n (wide-format only — long-format with a type column is left for a follow-up).
LGMFormula._build_spatial_block — Method
_build_spatial_block(component, data_cols, coord_cols::Tuple, n_obs::Int)
-> SparseMatrixCSC{Float64, Int}Build the design-matrix block for a tuple-coordinate f((cols...), component) term. The default method throws — concrete implementations live in package extensions. LGMFormulaINLASPDEExt overloads this for SPDE2 to build a barycentric MeshProjector.
LGMFormula._check_columns — Method
_check_columns(data, lhs, covariates, randoms)Validate that data is a Tables.jl source and that every referenced column is present. Errors refer to user-visible names, not table internals.
LGMFormula._flatten_plus — Method
_flatten_plus(expr) -> Vector{Any}Flatten an a + b + c chain into [a, b, c]. Anything else returns as a single-element list.
LGMFormula._parse_args — Method
_parse_args(args) -> (formula_expr, opts::Dict{Symbol,Any})Split macro args into the formula expression and key = value options. Accepts both bare-form (@lgm y ~ 1 data=df) and parenthesised-form (@lgm(y ~ 1, data=df)).
LGMFormula._parse_f_term — Method
_parse_f_term(s::Expr) -> NamedTuplePull the column symbol, component expression, and replicate/group keyword arguments out of an f(...) call. Accepts both f(col, comp; replicate=id) (semicolon-style) and f(col, comp, replicate=id) (trailing-kw style); rejects unsupported keywords with a user-visible error.
The first positional argument is either a bare column name (Symbol) for index-typed f(col, Comp) random effects, or a tuple of column names (s_col, t_col) (length 2 for spatial SPDE) or (s_col, t_col, time_col) (length 3, reserved for KroneckerComponent in PR-7c).
LGMFormula._parse_formula — Method
_parse_formula(expr) -> (lhs::Vector{Symbol}, rhs)Split lhs ~ rhs into LHS column names and RHS expression. The LHS is always returned as a Vector{Symbol} (length 1 for single-likelihood, length k > 1 for multi-likelihood tuple-LHS (y1, y2, ...) ~ rhs, ADR-033).
LGMFormula._split_rhs — Method
_split_rhs(rhs) -> (has_intercept::Bool,
covariates::Vector{Symbol},
randoms::Vector{<:NamedTuple})Walk the RHS, splitting at +. Each summand is one of:
1— explicit intercept marker (default if no marker present).0or-1— explicit "no intercept".- bare
Symbol— fixed-effects covariate column. f(col, Component(...))— random-effect term;colis a column symbol, the second arg is the (un-evaluated) component expression.f(col, Component(...); replicate = id_col)— R-INLA-style replicated component (PR-5). The macro emits a runtime call that wraps the inner component asReplicate(comp, R)whereR = maximum(id_col).f(col, Component; group = grp_col)— R-INLA-style grouped component (PR-5, factory form). The second positional argument is the factory (aSymbolor callable, not an instance); the macro emits a runtimeGroup(factory, grp_col)wrap.
Each f(...) term lowers to a NamedTuple{(:col, :comp_expr, :replicate, :group)} where replicate / group carry the keyword- argument column symbols (or nothing).
Other forms (transformations, interactions, etc.) raise an error referring to user concepts.
LGMFormula._wrap_term — Method
_wrap_term(comp_or_factory, data, replicate_col, group_col) ->
AbstractLatentComponentPR-5 runtime helper. Wraps an f(col, Comp; replicate = id) or f(col, Factory; group = grp) term against the actual data table:
replicate_col::Symbol: returnsReplicate(comp, R)whereR = maximum(data.$replicate_col).comp_or_factorymust be anAbstractLatentComponentinstance.group_col::Symbol: returnsGroup(factory, data.$group_col).comp_or_factoryis the factory (e.g.IID,AR1); the per-group inner components are constructed by the LGM coreGroup(factory, group_id)constructor — seepackages/LatentGaussianModels.jl/src/components/group.jl.- both
nothing: returnscomp_or_factoryunchanged (must be a component instance — caller's responsibility).
LGMFormula.lgmformula — Method
lgmformula(data; lhs, intercept = true, covariates = Symbol[],
randoms = [], family) -> LatentGaussianModelFunction form of @lgm. Accepts a structured description of the formula and returns the same LatentGaussianModel the macro would produce.
Arguments
data: aTables.jl-compatible source.lhs::Union{Symbol, AbstractVector{Symbol}}: outcome column name(s). A vector triggers tuple-LHS multi-likelihood;familymust then be a tuple of likelihoods of matching length.intercept::Bool = true: whether to includeIntercept().covariates::Vector{Symbol} = Symbol[]: scalar fixed-effect column names. BecomesFixedEffects(length(covariates))if non-empty.randoms::AbstractVector = []: list of f-term specifications. Each entry may be:(col::Symbol, comp::AbstractLatentComponent)— plain f-term.(col, comp, replicate::Symbol, group::Nothing)— replicated component; runtime wraps asReplicate(comp, R).(col, factory, replicate::Nothing, group::Symbol)— grouped component; runtime wraps asGroup(factory, grp_col_values).- A
NamedTuple{(:col, :comp_expr, :replicate, :group)}— internal form emitted by the macro.
family: observation likelihood (single-LHS) or tuple of likelihoods (multi-LHS).
LGMFormula.@lgm — Macro
@lgm formula data=df family=Likelihood()Build a LatentGaussianModel from a formula expression bound to a Tables.jl-compatible source.
Supported
@lgm y ~ 1 data=df family=GaussianLikelihood()— intercept only.@lgm y ~ 1 + x1 + x2 data=df family=GaussianLikelihood()— intercept + scalar covariates.@lgm y ~ 0 + x data=df family=GaussianLikelihood()— no intercept (-1also accepted).@lgm y ~ 1 + f(idx, IID(n)) + f(t, RW1(T)) data=df family=PoissonLikelihood()— intercept + multiple random effects.@lgm (y1, y2) ~ 1 + f(idx, IID(n)) data=df family=(GaussianLikelihood(), PoissonLikelihood())— multi-likelihood tuple-LHS with shared RHS (wide-format).@lgm y ~ 1 + f(t, AR1(n); replicate = id) data=df family=GaussianLikelihood()— replicated component (R-INLA'sreplicate=id); runtime wraps asReplicate(comp, R)withR = maximum(data.id).@lgm y ~ 1 + f(t, AR1; group = grp) data=df family=GaussianLikelihood()— grouped component (R-INLA'sgroup=grp+ factory form); runtime wraps asGroup(factory, data.grp)with one inner component per group label.
Restrictions
- Fixed-effects terms must be bare column symbols. Transformations (
log(x),x1*x2, factor expansions) are not yet supported. - The
colof anf(col, Component)term must be a column of integers in1:length(Component). - Tuple-LHS columns must all have the same length (wide-format only; long-format with a
typecolumn is left for a follow-up). replicateandgroupare mutually exclusive within a singlef(...)term.Copy(...)augmentation (f(...; copy = :name)) ships in PR-4b.
Expansion
The macro expands to an explicit LatentGaussianModel(...) call with a lgmformula-built design matrix; run @macroexpand to inspect. The components tuple and likelihood appear literally in the expansion; only the design matrix construction is deferred to runtime (it depends on the data).