Frequently Asked Questions

Installation

I get “file not found” errors for refractive index data

This usually means you cloned the repository before March 2026, when the git history was rewritten to reduce the download size. The internal data file paths changed during that migration.

Fix: Delete your local clone and re-clone:

rm -rf biosnicar-py
git clone https://github.com/jmcook1186/biosnicar-py.git
cd biosnicar-py
pip install -e .

Which Python versions are supported?

Python 3.8 and above. A fresh virtual environment is recommended to avoid dependency conflicts.

Parameters and inputs

What does `direct` mean — should I use 0 or 1?

direct=1 means clear sky with a direct solar beam (sunny). direct=0 means fully diffuse illumination (overcast). The spectral distribution of incident light changes between these two cases.

Use direct=1 for clear-sky satellite scenes and direct=0 for cloudy conditions. When uncertain, direct=1 is a reasonable default.

When I pass a scalar impurity concentration, where does it go?

Scalar impurity values are applied to the first (surface) layer only and zero-padded to deeper layers:

run_model(black_carbon=5000)
# equivalent to black_carbon=[5000, 0] for a 2-layer model

To distribute impurities across layers, pass a list:

run_model(glacier_algae=[40000, 10000, 0, 0, 0])

Scalar ice property parameters (like rds, rho) broadcast to all layers.

What happens when I change the number of layers?

If your dz list has more elements than the default, all per-layer attributes are automatically resized:

Ice properties (rds, rho, grain_ar, etc.) are extended by repeating their last value
Impurity concentrations are zero-padded (only the surface layer gets the concentration)

To control exactly what each layer gets, pass lists from the start:

outputs = run_model(
    dz=[0.02, 0.05, 0.05, 0.05, 0.83],
    rds=[800, 900, 1000, 1100, 1200],
    glacier_algae=[40000, 10000, 0, 0, 0],
)

Solvers

Should I use adding-doubling or Toon?

Use adding-doubling (the default). It is numerically exact for the two-stream equations and stable across the full parameter space.

The Toon solver is faster but can produce unphysical values (albedo outside 0–1) at extreme parameter combinations — very large grain radii with high impurity loading at strong absorption bands.

The pre-built emulator is trained on adding-doubling output, so use adding-doubling if you plan to compare against it.

Inversions and retrieval

Why retrieve SSA instead of rds and rho separately?

Spectral albedo in the near-infrared is controlled by specific surface area (SSA), not by bubble radius or density individually. Many different (rds, rho) combinations produce the same SSA and therefore nearly identical spectra.

When you try to retrieve rds and rho separately, the optimiser finds a continuous valley of equally good solutions — it can trade rds against rho without changing the cost. Typical retrieval errors:

rds alone: ~73%
rho alone: ~33%
SSA: ~5.5%

Retrieve rds directly only when you have an independent density measurement that you can fix via fixed_params.

Why can’t I retrieve the `direct` parameter?

direct is binary (0 or 1) — the forward model treats it as a discrete switch. Continuous optimisation over a binary variable is meaningless. Pass it via fixed_params instead:

result = retrieve(..., parameters=["ssa", "black_carbon"], fixed_params={"direct": 1})

Why are my dust retrievals unreliable?

The default mineral dust has a very flat, spectrally weak absorption signature at typical environmental concentrations. The cost surface is essentially insensitive to dust below ~5,000 ppb — the spectral difference between 100 and 5,000 ppb is smaller than typical measurement noise.

Best practice: Fix dust at a reasonable background value (e.g., 1,000 ppb) via fixed_params when retrieving other parameters. Only retrieve dust if you expect concentrations above ~10,000 ppb. OR if you are using biosnicar in a geogrphic region where dust is more strongly absorbing (the default represents quartz-dominated dusts from Greenland Ice Sheet mid-ablation zone) then swap the dust optical property files for a more appropriate one.

My satellite band retrieval is much more uncertain than spectral retrieval — why?

Satellite band mode (e.g., 5 Sentinel-2 bands) provides far less spectral information than the full 480-band spectrum. The cost surface is less constrained.

Best practices for band-mode retrieval:

Limit free parameters to 2–3 maximum
Fix everything else via fixed_params (especially direct, solzen, dust)
Provide obs_uncertainty so the optimiser weights bands appropriately

How does the Hessian uncertainty work? Can I trust it?

The default uncertainty method uses the Laplace approximation: it assumes the cost surface is locally quadratic (bowl-shaped) around the best fit, computes the curvature (Hessian matrix) via finite differences, and inverts it to get an approximate posterior covariance matrix. The 1-sigma uncertainty for each parameter is the square root of the corresponding diagonal element.

This is the same optimal estimation framework used throughout atmospheric and cryospheric remote sensing (Rodgers, 2000; Bohn et al., 2021). It is fast (< 1 second) and reliable when:

The retrieval is well-posed (2–4 free parameters, good spectral coverage)
The cost surface is unimodal near the minimum
The posterior is approximately Gaussian (symmetric error bars make sense)

It can underestimate uncertainty when the posterior is skewed, multimodal, or has strong parameter correlations. If the Hessian is singular (some direction is completely unconstrained), the uncertainty is reported as infinity — this is correct behaviour, not a bug.

For routine work, the Hessian uncertainty is sufficient. For publication-quality reporting or weakly constrained problems, use MCMC.

When should I use MCMC instead of the default uncertainty?

MCMC (method="mcmc") explores the full posterior distribution by drawing samples, making no assumptions about its shape. Use it when:

You need publication-quality uncertainty — reviewers may expect posterior distributions rather than Gaussian error bars
The posterior might be non-Gaussian — e.g., skewed impurity distributions near zero, or multimodal solutions
You want to examine parameter correlations — the full chains let you build corner plots showing how parameters trade off
You’re working with band-mode retrievals where few observations weakly constrain multiple parameters

MCMC takes ~30–60 seconds (vs < 1 second for Hessian) because it requires ~100,000+ emulator evaluations. This is only practical because the emulator runs at ~1 µs per call. The result includes full chains, acceptance fraction, and autocorrelation time for convergence diagnostics.

BioSNICAR uses the emcee affine-invariant ensemble sampler (Foreman-Mackey et al., 2013), which handles moderate correlations well without manual tuning.

What wavelength regions should I mask in spectral retrievals?

If your observations are affected by atmospheric absorption (e.g., space-borne or tower-based measurements), exclude the water vapour bands at ~1.38 µm and ~1.87 µm:

mask = np.ones(480, dtype=bool)
mask[117:122] = False  # 1.38 µm water vapour band
mask[149:154] = False  # 1.87 µm water vapour band
 
result = retrieve(observed=measured_albedo, ..., wavelength_mask=mask)

Emulator

How accurate is the emulator?

The default pre-built emulator achieves R² > 0.999 and mean absolute error < 0.005 in broadband albedo on held-out test data.

Use the emulator for inversions, MCMC, parameter sweeps, and interactive exploration. Use the forward model directly when validating results or working outside the emulator’s training range.

What happens if my parameters are outside the emulator’s training range?

The emulator will extrapolate and produce unreliable (potentially nonsensical) results. Always check the training parameter ranges in the emulator metadata.

The default emulator covers:

Parameter	Range
`rds`	500–10,000 µm
`rho`	300–900 kg/m³
`black_carbon`	0–5,000 ppb
`glacier_algae`	0–500,000 cells/mL
`dust`	0–50,000 ppb
`solzen`	25–80°

If you need different ranges, build a custom emulator.

Subsurface light

My PAR calculation shows values above 1.0 just below the surface — is this a bug?

No. In clean ice (weakly absorbing, highly scattering), subsurface PAR can exceed the incident value in the uppermost few centimetres. Light penetrates, scatters back upward, and augments the downwelling stream — a radiation-trapping effect.

This enhancement disappears when surface impurities are present, because they absorb photons before they can scatter back. The effect matters for modelling algal habitat: the light environment in clean ice is not simply “darker with depth”.

Miscellaneous

Which refractive index dataset does BioSNICAR use?

The default is Warren (1984). BioSNICAR also supports Warren (2008) and Picard (2016). All agree to within ~1% at NIR wavelengths; Picard (2016) is more accurate in the visible. Change the dataset via the OP_DIR_STUBS setting in inputs.yaml.

How does `parameter_sweep()` combine multiple parameters?

It computes the Cartesian product of all input ranges:

df = parameter_sweep(
    params={"solzen": [30, 40, 50], "rds": [500, 1000]}
)
# Returns 3 x 2 = 6 rows

This grows fast — 5 parameters with 5 values each = 3,125 runs. For large sweeps, consider using the emulator.

End-to-End Workflow