Forcing Data

Forcing data provides the meteorological inputs required to drive MARRMOT models. This guide covers how to work with different types of forcing data in MarrmotFlow.

Required Variables

Precipitation

Precipitation is a mandatory input for all MARRMOT models.

Supported units: - mm/day (default) - mm/hour - m/day - kg m-2 s-1 (CMIP6 standard) - inches/day

forcing_vars = {
    "precip": "precipitation"  # Variable name in your dataset
}

forcing_units = {
    "precip": "mm/day"  # Units of precipitation data
}

Temperature

Temperature is required for evapotranspiration calculations and snow processes.

Supported units: - celsius (default) - kelvin - fahrenheit

forcing_vars = {
    "temp": "temperature"  # Variable name in your dataset
}

forcing_units = {
    "temp": "celsius"  # Units of temperature data
}

Data Sources

Climate Reanalysis

Popular reanalysis datasets supported:

ERA5 (ECMWF):

# ERA5 configuration
forcing_vars = {
    "precip": "total_precipitation",
    "temp": "2m_temperature"
}

forcing_units = {
    "precip": "m/day",  # ERA5 uses meters
    "temp": "kelvin"    # ERA5 uses Kelvin
}

NCEP/NCAR Reanalysis:

# NCEP configuration
forcing_vars = {
    "precip": "prate",
    "temp": "air"
}

forcing_units = {
    "precip": "kg m-2 s-1",
    "temp": "kelvin"
}

Climate Models

CMIP6 Data:

# CMIP6 standard names
forcing_vars = {
    "precip": "pr",   # precipitation_flux
    "temp": "tas"     # air_temperature
}

forcing_units = {
    "precip": "kg m-2 s-1",
    "temp": "K"
}

Observational Data

Station Data:

# Station observations
forcing_vars = {
    "precip": "daily_precip",
    "temp": "mean_temp"
}

forcing_units = {
    "precip": "mm/day",
    "temp": "celsius"
}

Gridded Products (e.g., Daymet, PRISM):

# Daymet configuration
forcing_vars = {
    "precip": "prcp",
    "temp": "tmax"  # or "tmin", "tmean"
}

forcing_units = {
    "precip": "mm/day",
    "temp": "celsius"
}

File Handling

Single File

Load data from a single NetCDF file:

workflow = MARRMOTWorkflow(
    forcing_files="climate_data.nc",
    forcing_vars={"precip": "precipitation", "temp": "temperature"},
    # ... other parameters
)

Multiple Files

Handle multiple forcing files:

# Multiple files with same structure
forcing_files = [
    "climate_2020.nc",
    "climate_2021.nc",
    "climate_2022.nc"
]

workflow = MARRMOTWorkflow(
    forcing_files=forcing_files,
    forcing_vars={"precip": "precipitation", "temp": "temperature"},
    # ... other parameters
)

Different Variables in Different Files

When variables are in separate files:

# Separate files for different variables
forcing_files = [
    "precipitation_data.nc",
    "temperature_data.nc"
]

# Make sure variable names match across files
forcing_vars = {
    "precip": "precipitation",
    "temp": "temperature"
}

Time Handling

Time Zones

Specify time zones for proper temporal alignment:

workflow = MARRMOTWorkflow(
    forcing_time_zone="UTC",        # Time zone of forcing data
    model_time_zone="America/Vancouver",  # Local time zone for analysis
    # ... other parameters
)

Time Resolution

MarrmotFlow expects daily time resolution. Higher frequency data will be aggregated:

# Hourly data will be automatically aggregated to daily
# Sub-daily aggregation is handled internally
pass

Temporal Coverage

Ensure your forcing data covers the analysis period:

import xarray as xr

# Check temporal coverage
forcing = xr.open_dataset("climate_data.nc")
print(f"Data period: {forcing.time.min().item()} to {forcing.time.max().item()}")
print(f"Time steps: {len(forcing.time)}")

Spatial Considerations

Spatial Resolution

Consider the resolution of your forcing data relative to catchment size:

# Check spatial resolution
import numpy as np

forcing = xr.open_dataset("climate_data.nc")
lon_res = np.diff(forcing.lon).mean()
lat_res = np.diff(forcing.lat).mean()

print(f"Spatial resolution: {lon_res:.3f}° × {lat_res:.3f}°")
print(f"Approximate resolution: {lon_res*111:.1f} km × {lat_res*111:.1f} km")

Spatial Coverage

Verify that forcing data covers all catchments:

import geopandas as gpd
import xarray as xr

# Load data
catchments = gpd.read_file("catchments.shp")
forcing = xr.open_dataset("climate_data.nc")

# Check coverage
cat_bounds = catchments.total_bounds  # [minx, miny, maxx, maxy]
forcing_bounds = [
    forcing.lon.min().item(), forcing.lat.min().item(),
    forcing.lon.max().item(), forcing.lat.max().item()
]

print(f"Catchment bounds: {cat_bounds}")
print(f"Forcing bounds: {forcing_bounds}")

Data Quality Assessment

Missing Values

Check for and handle missing data:

import xarray as xr

forcing = xr.open_dataset("climate_data.nc")

# Check for missing values
precip_missing = forcing.precipitation.isnull().sum()
temp_missing = forcing.temperature.isnull().sum()

print(f"Missing precipitation: {precip_missing.item()} values")
print(f"Missing temperature: {temp_missing.item()} values")

# Handle missing data
# Option 1: Drop time steps with missing data
forcing_clean = forcing.dropna(dim='time')

# Option 2: Interpolate missing values
forcing_interp = forcing.interpolate_na(dim='time')

Outlier Detection

Identify potential data quality issues:

# Check for unrealistic values

# Precipitation outliers
negative_precip = (forcing.precipitation < 0).sum()
extreme_precip = (forcing.precipitation > 500).sum()  # > 500 mm/day

print(f"Negative precipitation: {negative_precip.item()}")
print(f"Extreme precipitation (>500mm/day): {extreme_precip.item()}")

# Temperature outliers
extreme_cold = (forcing.temperature < -60).sum()  # < -60°C
extreme_hot = (forcing.temperature > 60).sum()    # > 60°C

print(f"Extreme cold (<-60°C): {extreme_cold.item()}")
print(f"Extreme hot (>60°C): {extreme_hot.item()}")

Advanced Forcing Data Configuration

Custom Variable Mapping

Handle non-standard variable names:

# Custom mapping for specific datasets
dataset_configs = {
    "era5": {
        "vars": {"precip": "total_precipitation", "temp": "2m_temperature"},
        "units": {"precip": "m/day", "temp": "kelvin"}
    },
    "cmip6": {
        "vars": {"precip": "pr", "temp": "tas"},
        "units": {"precip": "kg m-2 s-1", "temp": "K"}
    },
    "station": {
        "vars": {"precip": "daily_precip", "temp": "mean_temp"},
        "units": {"precip": "mm/day", "temp": "celsius"}
    }
}

# Use configuration
config = dataset_configs["era5"]
workflow = MARRMOTWorkflow(
    forcing_vars=config["vars"],
    forcing_units=config["units"],
    # ... other parameters
)

Multiple Data Sources

Combine different data sources:

# Example: Use high-quality station data where available,
# fill gaps with reanalysis data
forcing_files = [
    "station_data.nc",      # Higher priority
    "reanalysis_data.nc"    # Gap-filling
]

# MarrmotFlow will handle data merging internally

Preprocessing Workflow

Complete preprocessing example:

import xarray as xr
import pandas as pd

def preprocess_forcing_data(input_file, output_file):
    """Comprehensive forcing data preprocessing."""

    # Load data
    ds = xr.open_dataset(input_file)

    # Standardize time
    ds['time'] = pd.to_datetime(ds.time)

    # Handle missing values
    ds = ds.interpolate_na(dim='time', method='linear')

    # Quality control
    # Remove negative precipitation
    ds['precipitation'] = ds.precipitation.where(ds.precipitation >= 0, 0)

    # Flag extreme values
    temp_range = (-50, 50)  # Reasonable temperature range in Celsius
    ds['temperature'] = ds.temperature.where(
        (ds.temperature >= temp_range[0]) & (ds.temperature <= temp_range[1])
    )

    # Save processed data
    ds.to_netcdf(output_file)

    print(f"Processed data saved to {output_file}")
    return ds

# Use preprocessing
processed_data = preprocess_forcing_data("raw_data.nc", "processed_data.nc")

Best Practices

Validate data quality before using in workflows
Use consistent time zones across all datasets
Check spatial and temporal coverage matches your study domain
Document data sources and preprocessing steps
Handle missing data appropriately for your analysis
Consider multiple data sources for robustness
Test with small subsets before processing large datasets
Keep original data and track all preprocessing steps