Case in point:
Overture maps discrovery

We were asked to run a data quality scan on the Overture Maps dataset, a large geospatial dataset backed by Meta and Microsoft. Here’s what Aspen uncovered.

Places Lost at Sea

Overture’s “Places” dataset had 167,021 records located in the ocean. That’s 0.25% of all entries. Planning tools, search APIs, and spatial analytics models are all degraded when using data containing these points.

Duplicate Addresses Everywhere

Aspen ran a deduplication sweep across Overture’s “Addresses” dataset and flagged 7,121,175 duplicates, 1.66% of a 429M+ record dataset. Duplicate addresses skew analytics, location-based AI insights, and increase processing and storage costs.

Even New York City, where data is about as clean as it gets, was filled with duplicates.

Road Connectors That Connect to Nothing

We found 439,522 “floating connectors” in Overture’s “Transportation” dataset as well - about 0.18% of total records. These are road nodes that don’t actually link to any road segment. Disconnected connectors confuse navigation engines, and cause inaccurate location-based insights from AI models.

Aspen GIS flagged thousands of orphan connectors in NYC.

Technical Breakdown

  • Joined spatial data at scale using S2 indexing.

  • Filtered using true geospatial containment, not just bounding boxes.

  • Exported actionable reports in minutes.

  • All checks were run on Overture drops from March 19 and April 23, 2025.