Skip to content

🏁 Baselines & Benchmarking

GeoPlant provides a variety of strong and weak baselines for plant species distribution modeling, using both conventional ML and deep learning approaches. All code is available on GitHub and as reproducible Kaggle notebooks.

💡 Benchmark Tasks

  • Presence-Absence (PA) prediction: Predict the set of species present in a plot (multi-label).
  • Presence-Only (PO) experiments: (see Appendix) More challenging, due to missing absences.

Evaluation Metrics:
- AUC, Recall, Sample-Averaged F1 ($F_1^s$), and Precision
- Most results are reported for the top-25 predicted species per plot.


🪄 Naive Baselines

Naive predictors select the most common species for a given region or administrative district.
- Top-25 most common species per district: $F_1^s$ = 20.6% (PA), <9% (PO)


🔥 Single-Modality Deep Learning Baselines

Model Modality AUC Recall F1s
MLP Climatic Cubes 82.8 32.1 22.2
Landsat Cubes 82.6 42.0 28.4
Sentinel-2 Img 71.8 23.2 15.8
ResNet-6 Climatic Cubes 91.8 37.5 26.2
Landsat Cubes 92.1 44.8 30.3
Sentinel-2 Img 87.3 32.1 22.0
ResNet-18 Climatic Cubes 90.5 37.8 26.2
Landsat Cubes 91.8 44.2 29.9
Sentinel-2 Img 88.6 33.2 22.7

All scores averaged over 5 random seeds, PA data only.


🤖 Conventional ML Baselines

XGBoost and MaxEnt were tested with up to four predictors (location, climate, soilgrids, land cover).

Model Predictors Used AUC Recall F1s
XGBoost Location only 89.8 47.6 28.2
XGBoost Climatic only 88.9 46.1 26.7
XGBoost Location + Clim + Soil + LC 90.4 48.8 28.7
MaxEnt All (492 species only) - - ~0.17-0.18

XGBoost outperforms MaxEnt, but both lag behind deep multimodal models.


🏆 Multimodal Ensemble Baselines

Combining modalities (climatic + landsat + satellite images) via an ensemble of encoders (ResNet-6) further improves results.

Model Modalities AUC Recall F1s
MME Clim + Landsat 93.6 49.3 33.8
MME Clim + Landsat + Sentinel-2 94.0 49.7 34.1
MME + Top-K Estimation Clim + Landsat 93.6 45.0 35.9
MME + Top-K Estimation Clim + Landsat + Sentinel-2 94.0 45.3 36.2

Diagram: Multimodal Ensemble Model Each modality (cube/image) is encoded, concatenated, and classified jointly.


💡 Additional Insights

  • Top-K selection: Optimal number of species per survey is $k=20$–$30$ for cubes, $k=25$ for climate data.
  • Presence-Only (PO) Data: Training on PO and testing on PA gives much lower $F_1^s$ (as low as 8–15% for deep models).
  • Best Single Modality: Landsat cubes outperform Sentinel-2 images and climate alone.

📈 How to Reproduce?


📋 Table Legend

  • AUC: Area Under ROC Curve (binary per species, averaged).
  • Recall: Fraction of true species found in top-25 predictions.
  • F1s: Sample-averaged F1-score (across test plots, top-25 predictions).