Skip to main content

Milestone 2 β€” Prototype Development (PoC)

Period: March – August 2026 Β |Β  Budget: R 500,000 Β |Β  Status: πŸ”΅ Active


Objective​

Transform the M1 design specification into a live, end-to-end running system. Every entity defined in the ERDs, every use case described in the design document, and every pipeline diagrammed in the data architecture section must be implemented and demonstrated against real data.


Deliverables​

1. Automated GEE ingest pipeline​

ItemDetail
TriggerPython Cloud Function, 8-day schedule (matching Sentinel-2 pass cadence)
SourceGoogle Earth Engine β€” Sentinel-2 SR, MODIS MOD11A1, ERA5
TransformCloud masking (QA60 < 20%), zonal statistics via reduceRegions()
OutputMalaria_Observation rows: lst_surface_temp_c, soil_moisture_pct, ndwi_water_index, ndvi_vegetation_index
LoadWrites to Ward Risk store via Data Ingestion service

2. XGBoost Malaria Risk Forecasting​

The ML pipeline runs weekly on all zones with sufficient observation history:

ML Pipeline triggers
β†’ READ Malaria_Observation (lst, soil_moisture)
β†’ READ Ground_Observation (aggregated_case_count)
β†’ Apply environmental_lag_days offset
β†’ Run XGBoost algorithm
β†’ Output outbreak_probability_score per zone_id
β†’ IF score > 0.80 β†’ CREATE User_Risk_Alert

Output stored in Malaria_Risk_Forecast: forecast_id, zone_id, target_prediction_date, outbreak_probability_score, model_version_used.

3. NCD Atmospheric Pipeline​

Copernicus/Sentinel-based pipeline running in parallel:

Satellite / SourceIndicatorField
Copernicus S5P/OFFL/L3_NO2Nitrogen dioxide columnno2_density
Copernicus S5P/OFFL/L3_SO2Sulphur dioxideso2_density
Copernicus S5P/OFFL/L3_AER_AIAerosol index (PM proxy)aerosol_index
Copernicus S5P/OFFL/L3_UVIUV radiationuv_radiation_index
MODIS/061/MOD11A2Land surface temperaturelst_mean_c
ESA WorldCover/v100Urban growth boundaryurban_cover_percentage

4. Server-side risk scoring​

calculateDynamicRisk() moves from the browser to a Cloud Run microservice:

R_{\text{ward}} = w_1 \cdot \text{NDWI} + w_2 \cdot \text{LST}_{\text{norm}} + w_3 \cdot \text{Precip}_{\text{anom}} + w_4 \cdot (1 - \text{NDVI})

Enables real-time scoring at request time rather than relying on pre-computed CSV values.

5. Alert engine integration​

The User_Risk_Alert entity goes live:

  • Forecasts with outbreak_probability_score > 0.80 trigger CREATE User_Risk_Alert
  • Email dispatched to all users assigned to the affected zone_id
  • Alert status transitions: Active β†’ Acknowledged β†’ Resolved
  • Full audit trail written to Audit Log

6. Multi-country expansion​

Ward boundary GeoJSON for three additional countries ingested into the Zone table:

CountryAdmin levelSource
ZimbabweDistrict wardSADHS 2019
MozambiqueDistritoSADHS 2011
ZambiaDistrictSADHS 2018

Business rules in effect​

RuleRequirement
BR-01Risk predictions refreshed minimum every 24 hours using latest SANSA/GEE pulls
BR-02Persistent disclaimer: outputs are predictive intelligence, not clinical directives
BR-03All ingest, scoring, and alert events immutably logged (POPIA compliance)
BR-04No external dataset ingested without passing the automated Quality Test protocol

Domain rules enforced​

RuleConstraint
DR-01Malaria risk constrained by environmental thresholds (standing water proximity + LST combinations)
DR-02NCD forecasting weights long-term exposure metrics heavier than short-term anomalies
DR-03All ML predictions displayed with outbreak_probability_score confidence interval
DR-04Raster and vector data must be spatially and temporally aligned before entering PCA pipeline

Sprint plan​

SprintFocusTarget
S1 (Mar–Apr)GEE ingest pipeline end-to-endMalaria_Observation rows flowing
S2 (Apr–May)XGBoost model training and forecast pipelineMalaria_Risk_Forecast rows + alerts
S3 (May–Jun)NCD atmospheric pipeline + NCD_ObservationNCD scores live
S4 (Jun–Jul)Server-side scoring + Cloud Run deployReal-time risk API
S5 (Jul–Aug)Multi-country boundaries + dashboard integrationFull prototype demo