A black and white photo of a rubix cube with the words figure it out inside the squares.

Untangling sources of truth with an event-driven system


A public sector client had a problem that will sound familiar to a lot of teams—multiple systems all claimed to be the source of truth for the same citizen data. An identity provider handled SSO, a CRM platform, and various internal APIs and verification providers also joined the mix.

But users don’t care if one backend has their email address and another has their PO box. They care about a seamless experience as they move across different applications, with consistent and accurate data at every step.

To solve this problem, we built an event‑driven architecture on cloud messaging and queue services. It helps maintain consistency and recovers gracefully when third-party APIs are degraded or unavailable.

This article recaps how the architecture grew from a simple “seed” to a full “forest,” based on a recent lightning talk I gave at the Code for America Summit in Chicago.

The seed: A simple event pipeline

We started with an intentionally simple pattern:

  • An event comes from a third party. 
  • The event bus routes it to a queue. 
  • A worker processes that queue.
  • The database is updated if needed.

This “seed” architecture already decouples the external identity provider from our application database, and gives us a place to absorb spikes in incoming events.

Diagram showing an identity provider sending events to an event bus, which routes to a queue, then a worker, then a database.

 

The sapling: The event bus as the pattern-matching trunk

As we added more providers, we needed to handle new event types—like identity verification webhooks and change data capture (CDC) events—but without constantly changing our application code.

The event bus became the “trunk” of our sapling. It doesn’t care whether something is a login, profile update, or a verification event. It simply matches patterns and routes events to the right queues. Dedicated workers consume from those queues, each focused on a specific concern like SSO sync or CRM sync.

Diagram showing multiple providers (identity provider, CRM platform, verification service) emitting events into a central event bus, which routes to several queues and workers.

 

The forest: A living ecosystem of events

Once we had the routing and the queue system in place, we were able to progress to a full, living and breathing event ecosystem—what I called the “forest.”

Multiple systems now emit events into the central bus, which fans them out into specialized queues—profile events, app logins, CRM account updates, identity verification events, resync queues, and more.

We have two kinds of consumers handling these queues:

  • Long-running workers (SSO sync, app login, CRM sync, identity verification , CRM account) that handle queue processing continuously.
  • On-demand jobs (SSO resync, CRM resync, content ETL, a resync enqueuer) that wake up to handle specific scenarios like resyncing from a third‑party API.

All of this flows into our operational database, while workers also emit internal events back into the event bus so other parts of the system can react without being tightly coupled to any single source. When a third‑party provider experiences disruption or a worker encounters errors, queues and the central bus give the system room to absorb the noise and recover gracefully.

Diagram showing an identity provider, CRM platform, verification service, and CMS feeding a central event bus, which routes events to multiple queues, workers, on-demand jobs, and an operational database.

 

What keeps our event-driven ecosystem alive

There are three principles that keep this kind of event-driven system healthy.

1. Repeat‑safe processing

In this architecture, retries and reprocessing are normal. If you run the same message or event multiple times, you want to end up in the same correct state every time.

Designing steps to be “repeat‑safe” means that any reprocessing—whether from automatic retries or manual intervention—does not corrupt data or create surprises for users.

2. Internal events carry IDs, not data

One key thing we learned is that internal events should usually carry minimal identifiers (with a few exceptions). Otherwise, the data can go stale by the time it’s handled, leaving you with bad race conditions that are hard to solve.

Instead, internal events carry IDs that point back to the authoritative source for a given domain—whether that’s our operational database, CRM platform, or another system—so that when a worker processes an event, it pulls fresh data at that moment. That way, re‑drives always pull the truth rather than replaying an old snapshot.

3. Automated recovery mechanisms with observability

Fault-tolerant event handling is a first‑class part of this ecosystem, not an afterthought. After a limited number of retries, failed messages are routed to isolated holding areas where they can be inspected, understood, and reprocessed.

Those holding areas are wired up with automated alerts so engineers are notified when something starts to fail, rather than discovering it weeks later. Failure is normal in a system like this, but silence is dangerous. If fault-tolerant queues stay empty for long periods, it’s important to verify your retry logic, alerting,  and failure instrumentation are actually working.

Final thoughts on our event‑driven architecture

Our event‑driven approach turned a tangle of competing sources of truth into a system where multiple applications can share consistent data without being tightly coupled to one another. The central event bus handles pattern matching and routing, queues smooth out spikes and isolate failures, and workers and jobs can evolve independently as the ecosystem grows.

By keeping internal events lightweight and ID‑based, and fault-tolerant event handling as core design principles, the system can keep breathing even under real‑world failure modes.

If you’re wrestling with competing sources of truth across your own systems, this kind of event‑driven “forest” might be a solution worth exploring. Get in touch with us to find out more, or learn about our expertise.


This article was created in partnership with our content writer, Peggy McGregor.