Show additional information on this project Expand
Faster than lightning Addressing API.

  • Agile/Kanban
Team Leadership

The Commonwealth of Massachusetts deals with addresses from many organizations, each entering addresses in their own unique way. This leads to a single address being entered potentially dozens of different ways. The Massachusetts Bureau of Geographic Information (GIS) maintains a database that uniquely identifies each individual address within the state in a consistent format, so they needed a way to take the many possible variants of each address, and reconcile them to a canonical address entry within their internal data set - which was stored in a format that is difficult to query against (FGDB). Because of the volume of data the state deals with, using an external service was outside of the allowed budget.

LCM worked with the Commonwealth to build an extract-transform-load tool (ETL) that runs whenever an updated data set is provided by GIS. The ETL takes millions of records from the FGDB dataset, normalizes them, and imports them into an AWS RDS database. The state had a proof-of-concept process for this that took hours to run. By leveraging AWS Fargate and the open-source GDAL library the required time was brought to under 10 minutes.

After the ETL, we built an API endpoint using Serverless.js (AWS) that takes in addresses in a typical mailing address format (which allows for many potential variations). We leverage libpostal to separate the address into distinct address components, and then perform a query against the RDS database to see if any matches are returned.

The serverless architecture of both the ETL and the API endpoints are highly scalable and secure, and the state is only charged while they are actually being used. This allowed us to create a flexible address matching system that took advantage of their internal dataset of unique addresses, and allowed user input in a wide variety of formats, for a very small cost.

Outcomes: The API has gone through internal testing and is fully functional and is in the process of being rolled out across the Commonwealth. 

Measurements: Delivery of most API responses in < 500ms milliseconds.