Laxman Bokati
  • Home
  • About
  • CV
  • Publications
  • Projects

Continental-scale soil organic carbon

Legacy-data harmonization, dynamic baseline, and attainable SOC for CONUS

Soil organic carbon (SOC) is a cornerstone of soil health and one of the most discussed levers for climate mitigation. Open legacy SOC data are uneven in space and time, and most large-scale data-driven SOC modeling overlooks temporal drift — yet drift directly biases the baselines fed into carbon-incentive programs. Beyond a static map, what we ultimately need is a dynamic, decision-relevant benchmark: how much SOC a location currently stores, how much it could attainably store, and the deficit between the two.

Approach

1. Temporal adjustment of legacy SOC observations

Legacy soil profiles in CONUS span several decades of sampling; SOC at a given location drifts with climate and management, and ignoring that drift biases anything trained on the raw data. A two-stage method addresses this: proximity-based distance matrices group legacy observations into local neighborhoods, spatially resolved temporal slopes of SOC change are estimated from those neighborhoods (combining statistical and ML techniques), and every observation is then projected to a common reference year using its local slope. The resulting slopes are heterogeneous — positive in some areas, negative in others — reflecting divergent SOC trajectories across land-use types and climates.

Bokati et al., Scientific Reports 2025

2. Dynamic baseline and attainable SOC

Three quantities form the core of the framework:

  • SOCcs — current stocks, projected for 2024 with ensemble machine learning trained on the time-adjusted legacy observations.
  • SOCat — the attainable steady-state stock, taken as the maximum SOC value within a spatially constrained similarity matrix of projected SOCcs (matched by climate, soil order, land use, and management context). SOCat is intentionally dynamic: soil-loss processes continually shift the biophysical ceiling, and treating it as fixed misses where management actually has headroom.
  • SOCdef — the location-specific deficit (SOCat − SOCcs) — the unrealized sequestration capacity.

CONUS-wide mean cropland SOCdef is 3.46 kg C/m². The largest deficits cluster on Mollisols and Alfisols across the Midwest corn belt, though they are smaller than long-held assumptions — suggesting that decades of cropping have depleted not just SOCcs but the attainable ceiling itself. Degraded grasslands show a similar pattern, arguing for recalibration of sequestration targets to today’s projected steady state.

Somenahally, Bokati, Kumar, Geoderma 2025

3. Reliability diagnostics for soil-carbon mapping

Standard soil-carbon mapping (SCM) practice tends to inflate reported accuracy through three systematic mechanisms:

  • Profile-depth leakage — splitting depth increments of the same profile between train and test sets.
  • Circular bulk-density logic — bulk density is used to compute SOC stocks, then reintroduced as a predictor in the model fit.
  • Spatial-autocorrelation under random splits — conventional CV produces optimistic skill; spatial blocking reveals much lower, realistic skill.

The remedies — profile-level validation, spatially aware blocking, standardized BD and depth reporting, and alignment with policy-relevant depth intervals — are the prerequisites for SOC maps that hold up in policy and practice.

Bokati, Kumar, Somenahally, Eur. J. Soil Sci. 2026

Generated outputs

  • Time-adjusted legacy SOC training dataset for CONUS, with spatially resolved temporal slopes capturing the divergent SOC trajectories across regions and land-use types.
  • 250 m CONUS rasters of SOCcs, SOCat, and SOCdef spanning croplands, pasture, and forest.
  • Methodological synthesis and empirical sensitivity analyses showing how profile-depth leakage, bulk-density circularity, and spatial autocorrelation under random splits inflate reported SCM accuracy (e.g., R² ~0.8–0.9 dropping to ~0.4–0.5 once profile-level validation and spatial blocking are enforced).
  • Reliability guidelines for SCM: profile-level validation, spatially aware blocking, standardized BD/depth reporting, and alignment with GlobalSoilMap-style depth intervals.

Interactive Atlas

Click any county to see its SOCcs, SOCat, and SOCdef values; use the tabs to switch which layer is colored. Top 1 m, kg C/m².
Aggregation: cropland only. Each county value is the zonal mean over CDL-masked cropland pixels in the 250 m raster — counties with little or no cropland render gray. Aggregating over forest or grassland pixels would produce different patterns.

© 2026 Laxman Bokati · School of Sustainable Engineering and the Built Environment, ASU

 

Email · Google Scholar · GitHub · LinkedIn