Inverse Retrieval

The inverse module retrieves ice physical properties from observed albedo using emulator-powered optimisation. Given spectral or satellite-band observations, it estimates specific surface area (SSA), impurity concentrations, and optionally solar zenith angle, with uncertainty.

Quick start

from biosnicar.emulator import Emulator
from biosnicar.inverse import retrieve
 
emu = Emulator.load("data/emulators/glacier_ice_8_param_default.npz")
 
# Spectral retrieval (480-band observed albedo)
result = retrieve(
    observed=measured_albedo,
    parameters=["ssa", "glacier_algae"],
    emulator=emu,
    fixed_params={"direct": 1, "solzen": 50, "rho": 700},
)
print(result.summary())
print(f"SSA = {result.best_fit['ssa']:.2f} ± {result.uncertainty['ssa']:.2f} m²/kg")

True vs retrieved spectrum

Why SSA?

The spectral shape of ice albedo in the NIR is controlled by specific surface area (SSA), not by bubble radius (rds) or density (rho) individually. Many (rds, rho) combinations produce the same SSA and therefore nearly identical spectra. Retrieving them separately gives large errors:

Parameter	Individual error	SSA error
rds	~73%	—
rho	~33%	—
SSA	—	~5.5%

Optimising directly in SSA space eliminates this degeneracy. The optimiser sees one well-constrained parameter instead of two degenerate ones.

SSA degeneracy: rds/rho valley vs SSA well

Observation modes

Spectral mode (default)

Pass a 480-element array of spectral albedo (e.g. from a field spectrometer). This provides maximum information content and can constrain 4–5 parameters simultaneously.

result = retrieve(
    observed=measured_albedo,
    parameters=["ssa", "black_carbon", "glacier_algae", "dust"],
    emulator=emu,
    fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)

Use wavelength_mask to exclude noisy bands (e.g. water vapour absorption):

import numpy as np
mask = np.ones(480, dtype=bool)
mask[117:122] = False  # exclude 1.38 µm water vapour
mask[149:154] = False  # exclude 1.87 µm water vapour
 
result = retrieve(observed=measured_albedo, ..., wavelength_mask=mask)

Satellite band mode

Set platform and observed_band_names to work in band space. The emulator predicts the full spectrum internally, convolves to the platform’s bands, and compares:

import numpy as np
 
result = retrieve(
    observed=np.array([0.82, 0.78, 0.75, 0.45, 0.03]),
    parameters=["ssa", "black_carbon", "glacier_algae"],
    emulator=emu,
    platform="sentinel2",
    observed_band_names=["B2", "B3", "B4", "B8", "B11"],
    obs_uncertainty=np.array([0.02, 0.02, 0.02, 0.03, 0.05]),
    fixed_params={"direct": 1, "solzen": 50, "dust": 1000, "snow_algae": 0},
)

Supported platforms: sentinel2, sentinel3, landsat8, modis.

Important: Band mode provides less spectral information (4–5 values vs 480). Limit free parameters to 2–3 and fix well-constrained parameters via fixed_params.

Optimisation methods

Method	Use case	Typical time
`L-BFGS-B` (default)	Fast hybrid: DE pre-search + gradient polishing	~0.5 s
`Nelder-Mead`	Derivative-free, robust to noisy cost surfaces	~1 s
`differential_evolution`	Global search for strongly multimodal problems	~3 s
`mcmc`	Full Bayesian posterior distributions	~30 s

The default L-BFGS-B method uses a two-phase strategy: a quick differential evolution pre-search finds the right basin of attraction, then gradient-based polishing converges to machine precision.

# MCMC for publication-quality uncertainty
result = retrieve(
    observed=measured_albedo,
    parameters=["ssa", "black_carbon", "glacier_algae"],
    emulator=emu,
    method="mcmc",
    mcmc_walkers=32,
    mcmc_steps=2000,
    mcmc_burn=500,
    fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)

Uncertainty estimation

Both methods below report 1-sigma uncertainties in result.uncertainty, accessible as:

print(f"SSA = {result.best_fit['ssa']:.2f} ± {result.uncertainty['ssa']:.2f} m²/kg")

Hessian-based (default)

Applied automatically after L-BFGS-B, Nelder-Mead, and differential evolution. This is the Laplace approximation: the cost surface is assumed locally quadratic near the optimum, and the curvature of that quadratic (the Hessian matrix) encodes how tightly each parameter is constrained.

Concretely, BioSNICAR:

Evaluates the cost function at small perturbations around the best fit using a 4-point central-difference stencil
Assembles the Hessian matrix H (second derivatives of the cost with respect to each parameter pair)
Inverts to get the approximate posterior covariance: C = H⁻¹
Reports 1-sigma uncertainties as the square roots of the diagonal: σₖ = √|Cₖₖ|

This is the same mathematical framework as the optimal estimation approach widely used in atmospheric and cryospheric remote sensing (Rodgers, 2000; Bohn et al., 2021). It is computationally cheap — ~4n² additional emulator evaluations, negligible at microsecond speeds — and accurate when the cost surface is unimodal and approximately Gaussian near the minimum.

A singular Hessian (flat cost surface in some direction) produces infinite uncertainty, correctly flagging parameters that the observations cannot constrain.

When to trust it: For well-posed retrievals (2–4 parameters, good spectral coverage, moderate noise). The Hessian approximation tends to slightly underestimate uncertainty when the posterior is non-Gaussian (e.g., skewed or multimodal), but for most practical BioSNICAR retrievals the approximation is reliable.

MCMC posteriors

With method="mcmc", the full posterior distribution is explored using the affine-invariant ensemble sampler from emcee (Foreman-Mackey et al., 2013). Rather than assuming a shape for the posterior, MCMC draws samples from it directly.

BioSNICAR:

Initialises an ensemble of walkers (default 32) in a tight Gaussian ball around the best-fit estimate
Each walker proposes moves accepted or rejected based on the log-probability (uniform prior within bounds, chi-squared likelihood)
After a burn-in period (default 500 steps), the remaining chain (default 2000 steps) samples the posterior
Reports the posterior median as the point estimate and the posterior standard deviation as the 1-sigma uncertainty

result = retrieve(
    observed=measured_albedo,
    parameters=["ssa", "black_carbon", "glacier_algae"],
    emulator=emu,
    method="mcmc",
    mcmc_walkers=32,
    mcmc_steps=2000,
    mcmc_burn=500,
    fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)
 
# Full chains are available for downstream analysis (corner plots, convergence checks)
chains = result.chains  # shape: (n_steps, n_walkers, n_params)
print(f"Acceptance fraction: {result.acceptance_fraction:.2f}")  # target ~0.2-0.5

MCMC is slower (~30–60 s for a typical retrieval) but captures non-Gaussian posteriors, parameter correlations, and multimodality that the Hessian approximation misses. This matters for:

Weakly constrained parameters (e.g. dust at low concentrations)
Band-mode retrievals with few observations
Publication-quality uncertainty reporting

MCMC-based inversion of radiative transfer models for snow and ice has been validated in the literature (Robledano et al., 2023; Dillon et al., 2025).

Requires emcee>=3 (pip install emcee).

Which should I use?

	Hessian (default)	MCMC
Speed	< 1 s	30–60 s
Assumptions	Gaussian posterior (quadratic cost surface)	None — samples the true posterior
Best for	Routine retrievals, well-constrained problems	Publication uncertainty, weakly constrained problems
Output	Point estimate + symmetric error bars	Full posterior distribution + chains

For most users, the default Hessian uncertainty is sufficient. Switch to MCMC when you need to report uncertainties in a publication, when you suspect the posterior is non-Gaussian, or when you want to examine parameter correlations.

Regularization

Add prior information when parameters are weakly constrained by the observations. This adds a Gaussian penalty term to the cost function, pulling the retrieval towards the prior:

result = retrieve(
    ...,
    regularization={"dust": (500, 200)},  # prior: 500 ± 200 ppb
)

References

Rodgers, C. D. (2000). Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific Publishing.
Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM.
Bohn, N. et al. (2021). Optimal estimation of snow and ice surface parameters from imaging spectroscopy measurements. Remote Sensing of Environment, 264, 112613.
Foreman-Mackey, D. et al. (2013). emcee: The MCMC Hammer. Publications of the Astronomical Society of the Pacific, 125(925), 306–312.
Robledano, A. et al. (2023). Unraveling the optical shape of snow. Nature Communications, 14, 3955.
Dillon, K. M. et al. (2025). Evaluating sensitivity of optical snow grain size retrievals to radiative transfer models, shape parameters, and inversion techniques. The Cryosphere, 19, 2913–2935.

Fixed parameters

Use fixed_params to constrain known parameters and reduce the retrieval dimensionality:

result = retrieve(
    ...,
    parameters=["ssa", "black_carbon", "glacier_algae"],
    fixed_params={"direct": 1, "solzen": 53.2, "dust": 500, "snow_algae": 0},
)

The direct parameter must be fixed — it’s a binary flag that cannot be continuously optimised.

Retrievable parameters

Parameter	Physical meaning	Spectral sensitivity
`ssa`	Specific surface area	Very high (NIR)
`rds`	Bubble effective radius	High (NIR)
`rho`	Ice density	Moderate
`solzen`	Solar zenith angle	Moderate
`black_carbon`	Black carbon	High (VIS)
`snow_algae`	Snow algae	High (VIS)
`glacier_algae`	Glacier algae	High (VIS)
`dust`	Mineral dust	Low (see note)

Dust caveat: Mineral dust has a very flat spectral signature at typical environmental concentrations. Dust retrievals are unreliable unless concentrations are very high (> ~10,000 ppb). Consider fixing dust via fixed_params.

Known limitations

Only surface-layer impurity concentrations are retrieved
The default emulator assumes layer_type=1 (solid ice) — build a custom emulator for snow
Satellite reflectances are assumed to be surface reflectance (no atmospheric correction)
ssa cannot appear alongside rds or rho in the same retrieval

Band Convolution Subsurface Light Field