Forcing Data
Forcing data provides the meteorological inputs required to drive MARRMOT models. This guide covers how to work with different types of forcing data in MarrmotFlow.
Required Variables
Precipitation
Precipitation is a mandatory input for all MARRMOT models.
Supported units: - mm/day (default) - mm/hour - m/day - kg m-2 s-1 (CMIP6 standard) - inches/day
forcing_vars = {
"precip": "precipitation" # Variable name in your dataset
}
forcing_units = {
"precip": "mm/day" # Units of precipitation data
}
Temperature
Temperature is required for evapotranspiration calculations and snow processes.
Supported units: - celsius (default) - kelvin - fahrenheit
forcing_vars = {
"temp": "temperature" # Variable name in your dataset
}
forcing_units = {
"temp": "celsius" # Units of temperature data
}
Data Sources
Climate Reanalysis
Popular reanalysis datasets supported:
ERA5 (ECMWF):
# ERA5 configuration
forcing_vars = {
"precip": "total_precipitation",
"temp": "2m_temperature"
}
forcing_units = {
"precip": "m/day", # ERA5 uses meters
"temp": "kelvin" # ERA5 uses Kelvin
}
NCEP/NCAR Reanalysis:
# NCEP configuration
forcing_vars = {
"precip": "prate",
"temp": "air"
}
forcing_units = {
"precip": "kg m-2 s-1",
"temp": "kelvin"
}
Climate Models
CMIP6 Data:
# CMIP6 standard names
forcing_vars = {
"precip": "pr", # precipitation_flux
"temp": "tas" # air_temperature
}
forcing_units = {
"precip": "kg m-2 s-1",
"temp": "K"
}
Observational Data
Station Data:
# Station observations
forcing_vars = {
"precip": "daily_precip",
"temp": "mean_temp"
}
forcing_units = {
"precip": "mm/day",
"temp": "celsius"
}
Gridded Products (e.g., Daymet, PRISM):
# Daymet configuration
forcing_vars = {
"precip": "prcp",
"temp": "tmax" # or "tmin", "tmean"
}
forcing_units = {
"precip": "mm/day",
"temp": "celsius"
}
File Handling
Single File
Load data from a single NetCDF file:
workflow = MARRMOTWorkflow(
forcing_files="climate_data.nc",
forcing_vars={"precip": "precipitation", "temp": "temperature"},
# ... other parameters
)
Multiple Files
Handle multiple forcing files:
# Multiple files with same structure
forcing_files = [
"climate_2020.nc",
"climate_2021.nc",
"climate_2022.nc"
]
workflow = MARRMOTWorkflow(
forcing_files=forcing_files,
forcing_vars={"precip": "precipitation", "temp": "temperature"},
# ... other parameters
)
Different Variables in Different Files
When variables are in separate files:
# Separate files for different variables
forcing_files = [
"precipitation_data.nc",
"temperature_data.nc"
]
# Make sure variable names match across files
forcing_vars = {
"precip": "precipitation",
"temp": "temperature"
}
Time Handling
Time Zones
Specify time zones for proper temporal alignment:
workflow = MARRMOTWorkflow(
forcing_time_zone="UTC", # Time zone of forcing data
model_time_zone="America/Vancouver", # Local time zone for analysis
# ... other parameters
)
Time Resolution
MarrmotFlow expects daily time resolution. Higher frequency data will be aggregated:
# Hourly data will be automatically aggregated to daily
# Sub-daily aggregation is handled internally
pass
Temporal Coverage
Ensure your forcing data covers the analysis period:
import xarray as xr
# Check temporal coverage
forcing = xr.open_dataset("climate_data.nc")
print(f"Data period: {forcing.time.min().item()} to {forcing.time.max().item()}")
print(f"Time steps: {len(forcing.time)}")
Spatial Considerations
Spatial Resolution
Consider the resolution of your forcing data relative to catchment size:
# Check spatial resolution
import numpy as np
forcing = xr.open_dataset("climate_data.nc")
lon_res = np.diff(forcing.lon).mean()
lat_res = np.diff(forcing.lat).mean()
print(f"Spatial resolution: {lon_res:.3f}° × {lat_res:.3f}°")
print(f"Approximate resolution: {lon_res*111:.1f} km × {lat_res*111:.1f} km")
Spatial Coverage
Verify that forcing data covers all catchments:
import geopandas as gpd
import xarray as xr
# Load data
catchments = gpd.read_file("catchments.shp")
forcing = xr.open_dataset("climate_data.nc")
# Check coverage
cat_bounds = catchments.total_bounds # [minx, miny, maxx, maxy]
forcing_bounds = [
forcing.lon.min().item(), forcing.lat.min().item(),
forcing.lon.max().item(), forcing.lat.max().item()
]
print(f"Catchment bounds: {cat_bounds}")
print(f"Forcing bounds: {forcing_bounds}")
Data Quality Assessment
Missing Values
Check for and handle missing data:
import xarray as xr
forcing = xr.open_dataset("climate_data.nc")
# Check for missing values
precip_missing = forcing.precipitation.isnull().sum()
temp_missing = forcing.temperature.isnull().sum()
print(f"Missing precipitation: {precip_missing.item()} values")
print(f"Missing temperature: {temp_missing.item()} values")
# Handle missing data
# Option 1: Drop time steps with missing data
forcing_clean = forcing.dropna(dim='time')
# Option 2: Interpolate missing values
forcing_interp = forcing.interpolate_na(dim='time')
Outlier Detection
Identify potential data quality issues:
# Check for unrealistic values
# Precipitation outliers
negative_precip = (forcing.precipitation < 0).sum()
extreme_precip = (forcing.precipitation > 500).sum() # > 500 mm/day
print(f"Negative precipitation: {negative_precip.item()}")
print(f"Extreme precipitation (>500mm/day): {extreme_precip.item()}")
# Temperature outliers
extreme_cold = (forcing.temperature < -60).sum() # < -60°C
extreme_hot = (forcing.temperature > 60).sum() # > 60°C
print(f"Extreme cold (<-60°C): {extreme_cold.item()}")
print(f"Extreme hot (>60°C): {extreme_hot.item()}")
Advanced Forcing Data Configuration
Custom Variable Mapping
Handle non-standard variable names:
# Custom mapping for specific datasets
dataset_configs = {
"era5": {
"vars": {"precip": "total_precipitation", "temp": "2m_temperature"},
"units": {"precip": "m/day", "temp": "kelvin"}
},
"cmip6": {
"vars": {"precip": "pr", "temp": "tas"},
"units": {"precip": "kg m-2 s-1", "temp": "K"}
},
"station": {
"vars": {"precip": "daily_precip", "temp": "mean_temp"},
"units": {"precip": "mm/day", "temp": "celsius"}
}
}
# Use configuration
config = dataset_configs["era5"]
workflow = MARRMOTWorkflow(
forcing_vars=config["vars"],
forcing_units=config["units"],
# ... other parameters
)
Multiple Data Sources
Combine different data sources:
# Example: Use high-quality station data where available,
# fill gaps with reanalysis data
forcing_files = [
"station_data.nc", # Higher priority
"reanalysis_data.nc" # Gap-filling
]
# MarrmotFlow will handle data merging internally
Preprocessing Workflow
Complete preprocessing example:
import xarray as xr
import pandas as pd
def preprocess_forcing_data(input_file, output_file):
"""Comprehensive forcing data preprocessing."""
# Load data
ds = xr.open_dataset(input_file)
# Standardize time
ds['time'] = pd.to_datetime(ds.time)
# Handle missing values
ds = ds.interpolate_na(dim='time', method='linear')
# Quality control
# Remove negative precipitation
ds['precipitation'] = ds.precipitation.where(ds.precipitation >= 0, 0)
# Flag extreme values
temp_range = (-50, 50) # Reasonable temperature range in Celsius
ds['temperature'] = ds.temperature.where(
(ds.temperature >= temp_range[0]) & (ds.temperature <= temp_range[1])
)
# Save processed data
ds.to_netcdf(output_file)
print(f"Processed data saved to {output_file}")
return ds
# Use preprocessing
processed_data = preprocess_forcing_data("raw_data.nc", "processed_data.nc")
Best Practices
Validate data quality before using in workflows
Use consistent time zones across all datasets
Check spatial and temporal coverage matches your study domain
Document data sources and preprocessing steps
Handle missing data appropriately for your analysis
Consider multiple data sources for robustness
Test with small subsets before processing large datasets
Keep original data and track all preprocessing steps