Pipeline documentation
Thirteen automated stages, from raw satellite ingestion to a plain-language acquisition brief. Every stage runs without manual intervention. Total runtime: minutes.
Monthly cloud-free composites are downloaded from Google Earth Engine for the full analysis period, covering the AOI at native satellite resolution. Four spectral bands are built per month: vegetation index, built-surface index, surface albedo, and land surface temperature. Cloud masking uses per-pixel quality flags with adaptive window widening when cloud cover is persistent; remaining gaps are filled via forward/backward temporal inpainting. Every filled observation is flagged for downstream confidence weighting.
Technical details
A land/water boundary is auto-derived from the satellite signal itself rather than imported from a static dataset. NDVI values combined with the NaN fraction per pixel identify persistent water. Morphological operations remove coastal noise artifacts. The resulting mask is applied to every downstream layer: sea pixels are excluded from all statistics, scoring, and outputs.
Technical details
The full time series is divided into a reference period (typically the first 36 months) and a monitoring period (the remainder). Change is not measured as before/after: each monitoring observation is compared to the same calendar month across all baseline years, removing the seasonal cycle entirely. Two z-score streams run in parallel: a conservative stream that anchors to the original land state, and an adaptive stream that tracks the monitoring period's own recent mean to catch accelerating local bursts. A Mann-Kendall trend test is then run per pixel to classify stable, transient, and directionally trending change.
Technical details
The anomaly and trend outputs are compressed into a 17-feature matrix per pixel. This representation captures the magnitude of change across all four bands, the absolute drift (how far the monitoring period has moved from the baseline), the trend direction per band, and a one-hot encoding of the break context - the temporal character of when and how the change began.
Technical details
Global Moran's I is computed on the anomaly score surface before regime discovery begins. If spatial autocorrelation is too low, coherent zones cannot form and the pipeline warns rather than producing meaningless clusters.
Technical details
A quadtree subdivision places more grid cells in spatially heterogeneous areas and fewer where the signal is uniform. This ensures that local XGBoost models are concentrated where they can learn meaningful distinctions, rather than wasting capacity on homogeneous zones.
Technical details
One local XGBoost model is trained per grid cell against a global pool of pixels, weighted by spatial proximity (Gaussian kernel) and per-pixel uncertainty (inverse sigma). This local weighting prevents adjacent land-cover types from corrupting each other's attribution - a forest cell and an urban cell two kilometers apart will have almost no influence on each other's model, even if they share a grid quadrant.
Technical details
A TreeExplainer runs per grid cell to attribute each pixel's anomaly score to one of the four spectral bands. Attribution is local, not global: the same pixel can have a heat-driven signal in one zone and a vegetation-driven signal in an adjacent one. A drift ratio is also computed per pixel - separating the anomaly into its gradual structural drift component (months of slow accumulation) versus an acute event (rapid onset).
Technical details
Zones are discovered using SKATER, a graph-based spatial contiguity clustering algorithm. Unlike grid-based or k-means clustering, SKATER enforces spatial contiguity: every zone is a single connected region on the ground. Zone boundaries fall at genuine land-use transitions rather than arbitrary statistical partitions. The optimal number of zones is selected via silhouette score, rewarding genuine cluster separation over superficial splits.
Technical details
Each zone receives a complete characterization: the mean anomaly score, the dominant spectral driver, the direction of drift per band, the development phase, the onset timing, a plain-language narrative, and a decision signal. Onset timing is the commercially critical output - it tells the analyst not just that change has occurred, but when it started, and therefore whether the entry window is open or already closed.
Technical details
Isolated high-intensity events that are too small to form a coherent regime are detected separately. Blobs of five pixels or fewer at or above the 97th percentile of land anomaly scores are flagged as point anomalies. These catch demolished individual blocks, small flood pockets, isolated construction events, and other localized activity that would otherwise dissolve into a surrounding stable zone.
Technical details
Three context data sources are queried for each AOI. OSM Overpass retrieves POI counts by category within a dynamic radius scaled to AOI size. WorldPop provides gridded population estimates within the AOI and a 5 km buffer. Valhalla isochrones map walk and drive reachability from the AOI centroid. A coverage confidence rating is assigned based on POI density relative to expected urban coverage.
Technical details
A six-component per-pixel livability composite is computed across all land pixels. Each component is normalized to [0, 1] over land pixels only, then blended using template-specific weights. The composite drives opportunity ranking - it is not a standalone output but an input to the ranking stage.
Technical details
Empty buildable land is detected using an ensemble of globally-trained ML products - no per-region threshold tuning. Any source that flags a pixel as built triggers exclusion; at least one source must confirm empty land for a pixel to qualify. Watershed segmentation on the NDVI/Albedo gradient delineates individual plots. Each qualifying plot is then ranked by a composite of livability, neighborhood momentum, frontier proximity, area, zone signal, and signal stability.
Technical details
Legal protection status is resolved at pixel level before any opportunity is ranked. Multiple sources are OR-combined into a single exclusion mask. Any pixel covered by a legal exclusion is permanently ineligible for opportunity ranking, regardless of its livability or change signal. For Brazilian AOIs, three additional federal registries are queried automatically.
Technical details
Cadastral parcel boundaries are retrieved through a three-stage global strategy. For Brazilian AOIs, the pipeline first queries the public ONR ArcGIS catalog, then falls back to an authenticated web session with XHR interception to capture urban parcels and transaction layers that are only served to authenticated users. For non-Brazilian AOIs, ArcGIS Online is searched for county and state-level assessor data.
Technical details
Price context is assembled from multiple tiered sources and blended into a spatially continuous price surface. Each ranked opportunity receives a price estimate from the nearest reliable source. Active market listings from classifieds platforms are overlaid separately, providing supply-side asking prices alongside the registry-based transacted values.
Technical details
A 0-100 composite score is computed from five weighted components. Component weights are template-specific: a land-bank analysis weights trajectory health and compliance risk more heavily than amenity access; a single-family residential analysis does the reverse. The score drives the letter grade (A through D) shown in the acquisition brief.
Technical details
Eight project-type templates reconfigure the entire pipeline from band weights through to LLM narrative emphasis. A single parcel analyzed as a single-family lot and as an industrial site will produce different anomaly weightings, different livability composites, different POI importance rankings, and different verdicts. Templates are selected at job submission time.
Technical details
A Claude Sonnet call synthesizes all scored pipeline outputs into a 400-word plain-language acquisition brief. The model receives pre-scored signals as structured input - it is not asked to interpret raw satellite data. Hard constraints are enforced: no fabricated permits or timelines, every adjective must be backed by a number from the pipeline output, and the verdict must align with the decision signal computed in Stage 04g. Two output modes are available: an acquisition brief for developers and investors, or a neighborhood livability assessment for residential buyers. Both English and Brazilian Portuguese are supported.
Technical details
All thirteen stages in minutes. Legal layers, price context, and the acquisition brief included.