LGMFormula.jl

Tier-2 formula sugar for LatentGaussianModels.jl. Exposes a single macro @lgm and its function form lgmformula that lower a formula expression bound to a Tables.jl-compatible source into an explicit LatentGaussianModel(...) constructor call.

The macro is strictly source-to-source per ADR-008 and plans/macro-policy.md. Every expansion produces a constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.

When to use this

  • Migrating models written against R-INLA's inla(formula, …) API.
  • Wanting concise notation for standard R-INLA-style models with named index columns.

For a guided introduction, see the migration guide.

Quick example

using GMRFs, LatentGaussianModels, LGMFormula

df = (y = y, x = x, region = collect(1:n))
model = @lgm y ~ 1 + x + f(region, IID(n)) data=df family=PoissonLikelihood()
res = inla(model, df.y)

Status

v0.2.0. Phase N PRs 1–6 closed:

  • PR-1: core parser + single-likelihood single-f expansion.
  • PR-2: component coverage roundtrips.
  • PR-3: multi-f roundtrip coverage.
  • PR-4: tuple-LHS multi-likelihood (wide-format).
  • PR-5: replicate / group term routing.
  • PR-6: migration guide + vignette parity (this page).

PR-4b (Copy augmentation) and PR-7 (SPDE-friendly coordinate forms) ship in a follow-up.

API

LGMFormula.LGMFormulaModule
LGMFormula

Tier-2 formula sugar (@lgm) for LatentGaussianModels.jl.

The macro is a source-to-source transform: every expansion of @lgm(...) produces a LatentGaussianModel(...) constructor call that the user could have written by hand. Run @macroexpand @lgm(...) to inspect.

Public API

  • @lgm — formula macro.
  • lgmformula — function form; the macro lowers to this.

See plans/macro-policy.md and ADR-008.

source
LGMFormula._build_design_matrixMethod
_build_design_matrix(data, lhs, has_intercept, covariates, randoms)
    -> SparseMatrixCSC{Float64, Int}

Assemble the linear projector matrix A for the formula. Columns are ordered as [intercept | covariates | random-effect indicators...].

Each random-effect term contributes one block:

  • Plain f(col, Comp): length(comp) columns; row i is 1 in column idx[i] (within the block). Input column must contain integers in 1:length(comp).
  • f(col, Comp; replicate = id_col): R · length(comp) columns, laid out [x⁽¹⁾; x⁽²⁾; …; x⁽ᴿ⁾]. Row i is 1 in column (id[i] - 1) · length(comp) + col[i].
  • f(col, Factory; group = grp_col): Σ_g s_g columns where s_g is the per-group size (number of rows with grp == g). Row i is 1 in column offset[grp[i]] + col[i]. col[i] is required to lie in 1:s_{grp[i]}.
source
LGMFormula._build_expansionMethod
_build_expansion(lhs::Vector{Symbol}, has_intercept, covariates,
                 randoms, data_expr, family_expr) -> Expr

Return the Expr that @lgm expands to. Module references use absolute interpolation ($LatentGaussianModels.Intercept(), $LGMFormula._build_design_matrix(...)) so the expansion resolves regardless of the caller's using imports.

Single-LHS expands to a LinearProjector-equivalent SparseMatrixCSC (LGM auto-wraps). Multi-LHS expands to a StackedMapping with one LinearProjector(A) block per likelihood, sharing the RHS-built A.

PR-5: f(col, Comp; replicate = id_col) and f(col, Factory; group = grp_col) lower to _wrap_term(...) runtime calls in the components tuple — the wrapper resolves id_col/grp_col against data to construct Replicate(comp, R) / Group(factory, grp). The macro itself does no I/O; the AST is still data-free.

source
LGMFormula._build_multi_likelihood_mappingMethod
_build_multi_likelihood_mapping(data, lhs, has_intercept, covariates, randoms)
    -> StackedMapping

Build a row-partitioned StackedMapping for tuple-LHS multi-likelihood models. The shared RHS produces a single sparse A; each likelihood block wraps the same LinearProjector(A). Observation rows are partitioned contiguously: block k owns rows ((k-1)·n + 1):(k·n).

All columns in lhs must have equal length n (wide-format only — long-format with a type column is left for a follow-up).

source
LGMFormula._build_spatial_blockMethod
_build_spatial_block(component, data_cols, coord_cols::Tuple, n_obs::Int)
    -> SparseMatrixCSC{Float64, Int}

Build the design-matrix block for a tuple-coordinate f((cols...), component) term. The default method throws — concrete implementations live in package extensions. LGMFormulaINLASPDEExt overloads this for SPDE2 to build a barycentric MeshProjector.

source
LGMFormula._check_columnsMethod
_check_columns(data, lhs, covariates, randoms)

Validate that data is a Tables.jl source and that every referenced column is present. Errors refer to user-visible names, not table internals.

source
LGMFormula._flatten_plusMethod
_flatten_plus(expr) -> Vector{Any}

Flatten an a + b + c chain into [a, b, c]. Anything else returns as a single-element list.

source
LGMFormula._parse_argsMethod
_parse_args(args) -> (formula_expr, opts::Dict{Symbol,Any})

Split macro args into the formula expression and key = value options. Accepts both bare-form (@lgm y ~ 1 data=df) and parenthesised-form (@lgm(y ~ 1, data=df)).

source
LGMFormula._parse_f_termMethod
_parse_f_term(s::Expr) -> NamedTuple

Pull the column symbol, component expression, and replicate/group keyword arguments out of an f(...) call. Accepts both f(col, comp; replicate=id) (semicolon-style) and f(col, comp, replicate=id) (trailing-kw style); rejects unsupported keywords with a user-visible error.

The first positional argument is either a bare column name (Symbol) for index-typed f(col, Comp) random effects, or a tuple of column names (s_col, t_col) (length 2 for spatial SPDE) or (s_col, t_col, time_col) (length 3, reserved for KroneckerComponent in PR-7c).

source
LGMFormula._parse_formulaMethod
_parse_formula(expr) -> (lhs::Vector{Symbol}, rhs)

Split lhs ~ rhs into LHS column names and RHS expression. The LHS is always returned as a Vector{Symbol} (length 1 for single-likelihood, length k > 1 for multi-likelihood tuple-LHS (y1, y2, ...) ~ rhs, ADR-033).

source
LGMFormula._split_rhsMethod
_split_rhs(rhs) -> (has_intercept::Bool,
                   covariates::Vector{Symbol},
                   randoms::Vector{<:NamedTuple})

Walk the RHS, splitting at +. Each summand is one of:

  • 1 — explicit intercept marker (default if no marker present).
  • 0 or -1 — explicit "no intercept".
  • bare Symbol — fixed-effects covariate column.
  • f(col, Component(...)) — random-effect term; col is a column symbol, the second arg is the (un-evaluated) component expression.
  • f(col, Component(...); replicate = id_col) — R-INLA-style replicated component (PR-5). The macro emits a runtime call that wraps the inner component as Replicate(comp, R) where R = maximum(id_col).
  • f(col, Component; group = grp_col) — R-INLA-style grouped component (PR-5, factory form). The second positional argument is the factory (a Symbol or callable, not an instance); the macro emits a runtime Group(factory, grp_col) wrap.

Each f(...) term lowers to a NamedTuple{(:col, :comp_expr, :replicate, :group)} where replicate / group carry the keyword- argument column symbols (or nothing).

Other forms (transformations, interactions, etc.) raise an error referring to user concepts.

source
LGMFormula._wrap_termMethod
_wrap_term(comp_or_factory, data, replicate_col, group_col) ->
    AbstractLatentComponent

PR-5 runtime helper. Wraps an f(col, Comp; replicate = id) or f(col, Factory; group = grp) term against the actual data table:

  • replicate_col::Symbol: returns Replicate(comp, R) where R = maximum(data.$replicate_col). comp_or_factory must be an AbstractLatentComponent instance.
  • group_col::Symbol: returns Group(factory, data.$group_col). comp_or_factory is the factory (e.g. IID, AR1); the per-group inner components are constructed by the LGM core Group(factory, group_id) constructor — see packages/LatentGaussianModels.jl/src/components/group.jl.
  • both nothing: returns comp_or_factory unchanged (must be a component instance — caller's responsibility).
source
LGMFormula.lgmformulaMethod
lgmformula(data; lhs, intercept = true, covariates = Symbol[],
           randoms = [], family) -> LatentGaussianModel

Function form of @lgm. Accepts a structured description of the formula and returns the same LatentGaussianModel the macro would produce.

Arguments

  • data: a Tables.jl-compatible source.
  • lhs::Union{Symbol, AbstractVector{Symbol}}: outcome column name(s). A vector triggers tuple-LHS multi-likelihood; family must then be a tuple of likelihoods of matching length.
  • intercept::Bool = true: whether to include Intercept().
  • covariates::Vector{Symbol} = Symbol[]: scalar fixed-effect column names. Becomes FixedEffects(length(covariates)) if non-empty.
  • randoms::AbstractVector = []: list of f-term specifications. Each entry may be:
    • (col::Symbol, comp::AbstractLatentComponent) — plain f-term.
    • (col, comp, replicate::Symbol, group::Nothing) — replicated component; runtime wraps as Replicate(comp, R).
    • (col, factory, replicate::Nothing, group::Symbol) — grouped component; runtime wraps as Group(factory, grp_col_values).
    • A NamedTuple{(:col, :comp_expr, :replicate, :group)} — internal form emitted by the macro.
  • family: observation likelihood (single-LHS) or tuple of likelihoods (multi-LHS).
source
LGMFormula.@lgmMacro
@lgm formula data=df family=Likelihood()

Build a LatentGaussianModel from a formula expression bound to a Tables.jl-compatible source.

Supported

  • @lgm y ~ 1 data=df family=GaussianLikelihood() — intercept only.
  • @lgm y ~ 1 + x1 + x2 data=df family=GaussianLikelihood() — intercept + scalar covariates.
  • @lgm y ~ 0 + x data=df family=GaussianLikelihood() — no intercept (-1 also accepted).
  • @lgm y ~ 1 + f(idx, IID(n)) + f(t, RW1(T)) data=df family=PoissonLikelihood() — intercept + multiple random effects.
  • @lgm (y1, y2) ~ 1 + f(idx, IID(n)) data=df family=(GaussianLikelihood(), PoissonLikelihood()) — multi-likelihood tuple-LHS with shared RHS (wide-format).
  • @lgm y ~ 1 + f(t, AR1(n); replicate = id) data=df family=GaussianLikelihood() — replicated component (R-INLA's replicate=id); runtime wraps as Replicate(comp, R) with R = maximum(data.id).
  • @lgm y ~ 1 + f(t, AR1; group = grp) data=df family=GaussianLikelihood() — grouped component (R-INLA's group=grp + factory form); runtime wraps as Group(factory, data.grp) with one inner component per group label.

Restrictions

  • Fixed-effects terms must be bare column symbols. Transformations (log(x), x1*x2, factor expansions) are not yet supported.
  • The col of an f(col, Component) term must be a column of integers in 1:length(Component).
  • Tuple-LHS columns must all have the same length (wide-format only; long-format with a type column is left for a follow-up).
  • replicate and group are mutually exclusive within a single f(...) term.
  • Copy(...) augmentation (f(...; copy = :name)) ships in PR-4b.

Expansion

The macro expands to an explicit LatentGaussianModel(...) call with a lgmformula-built design matrix; run @macroexpand to inspect. The components tuple and likelihood appear literally in the expansion; only the design matrix construction is deferred to runtime (it depends on the data).

source