Skip to main content

Command Palette

Search for a command to run...

Lessons Learned and Future Improvements for my implemented Datahub

Reflections on building, evolving, and scaling a real Datahub

Updated
5 min read
Lessons Learned and Future Improvements for my implemented Datahub
N
Senior-level Fullstack Web Developer with 10+ years experience, including 2 years of Team Lead position. Specializing in responsive design and full-stack web development across the Vue.js and .NET ecosystems. Skilled in Azure/AWS cloud infrastructure, focused on DevOps techniques such as CI/CD. Experienced in system design, especially with software architecture patterns such as microservices, BFF (backend-for-frontend). Hands-on with Agile practices in team leading, and AI-assisted coding.

Series: Designing a Microservice-Friendly Datahub
PART III — CASE STUDY: MY CSL DATAHUB IMPLEMENTATION
Previous: End-to-End Data Flow Scenarios in Datahub

Architectures don’t become “good” because they were well-designed on day one. They become good because they survive contact with reality, accumulate scar tissue, and adapt without collapsing.

This final article steps away from diagrams and patterns to reflect on what the CSL Datahub taught us in practice: what worked, what surprised us, where the cracks appeared first, and how the system could evolve if built again today.

This is where architecture grows up.

Disclaimer (Context & NDA)
The CSL Datahub implementation discussed throughout this case study was designed and built in 2021. While the architectural principles remain sound, some tooling choices could be updated today. To comply with NDA requirements, business-specific logic, schemas, and operational details are intentionally generalized.


What Worked Well (And Why It Worked)

Some design decisions paid off immediately—and kept paying dividends.

1. Clear Ownership of State

Making the CSL Web App and its MySQL database the single source of truth removed ambiguity early.

Benefits:

  • No split-brain state

  • No cross-service writes

  • No schema politics

Every downstream system knew its role: react, don’t mutate.

This clarity prevented entire classes of bugs.

2. Event Emission After Commit

Events were emitted after database transactions completed, never before.

$db->commit();

$redis->xAdd(
    'csl:events',
    '*',
    [
        'type' => 'user.updated',
        'user_id' => $user->id,
        'occurred_at' => time()
    ]
);

This simple discipline avoided phantom events, partial updates, and confusing rollback scenarios. It also made replay and reasoning straightforward.

3. Redis Streams as a Shock Absorber

Redis Streams quietly did exactly what they were meant to do:

  • Absorb bursts

  • Decouple time

  • Protect the legacy app

They were rarely discussed—and that’s the highest compliment.

When systems are calm under pressure, it’s usually because something unglamorous is doing its job well.

4. RabbitMQ for Decentralized Consumption

RabbitMQ scaled not just in traffic, but in organizational complexity.

Teams could:

  • Add consumers independently

  • Remove consumers without coordination

  • Evolve their logic safely

Publish/subscribe worked as intended—no central registry of dependencies, no choreography meetings.


What Was Harder Than Expected

Some challenges only appear once the system is alive.

1. Operational Visibility Took Real Effort

Event-driven systems fail between services, not inside them.

We learned quickly that:

  • CPU metrics weren’t enough

  • Service uptime lied

  • Queue lag told the real story

It took deliberate work to monitor:

  • Redis stream lag

  • RabbitMQ queue depth

  • Processor throughput

  • DLQ growth

Visibility wasn’t optional—it was a feature we had to build.

2. Idempotency Was Non-Negotiable

At-least-once delivery meant duplication was inevitable.

The teams that embraced idempotency early slept better:

INSERT INTO processed_events (event_id)
VALUES (:event_id)
ON DUPLICATE KEY UPDATE event_id = event_id;

The teams that didn’t… learned the hard way.

Idempotency isn’t clever. It’s defensive driving.

3. The Processor Attracted Gravity

Despite best intentions, the Processor constantly tempted engineers with convenience.

“If we already have the event here, why not just…?”

That sentence is how God services are born.

It required discipline to:

  • Keep logic shallow

  • Push domain meaning to consumers

  • Split handlers early when responsibilities diverged

Architectural boundaries don’t enforce themselves.


Bottlenecks Discovered Over Time

No system escapes bottlenecks—only ignorance of them.

Processor Throughput

As event volume grew, Processor lag became the first scaling signal.

The fix wasn’t “more CPU.” It was:

  • Horizontal scaling

  • Splitting handlers by responsibility

  • Reducing synchronous API calls

The bottleneck revealed where coupling still existed.

Message Storms

Some early event designs were too chatty.

Emitting events on every micro-change caused:

  • Unnecessary fan-out

  • Consumer overload

  • Hard-to-reason flows

The solution was not throttling—it was semantic restraint:

  • Fewer events

  • More meaningful events

  • Better aggregation


Architecture Is a Living System

One of the most important lessons was psychological, not technical.

Architectures don’t “finish.” They age.

What worked at:

  • 3 modules

  • 2 teams

  • 10k events/day

…needed adjustment at:

  • 10 modules

  • 6 teams

  • 1M events/day

Treating architecture as frozen design would have killed the system. Treating it as a living system kept it adaptable.


Future Improvements (If We Were Building This Today)

With hindsight—and modern tooling—several evolutions would make sense.

1. Kafka for High-Volume Event History

If:

  • Event replay became core

  • Stream processing emerged

  • Retention requirements grew

Kafka would be a natural next step.

Not as a replacement for everything—but as a data backbone where history matters.

2. Splitting the Processor by Responsibility

Rather than one Processor service:

  • One for Redis → RabbitMQ

  • One for RabbitMQ → CSL API

  • One for external integrations

This would reduce blast radius and simplify scaling decisions.

3. Formal Event Contracts

Early contracts were implicit. Later, they deserved structure.

A shared schema repo with versioning would:

  • Reduce accidental breakage

  • Improve onboarding

  • Enable better validation

Contracts are how event-driven systems communicate trust.

4. Deeper Observability by Default

If rebuilt today:

  • Distributed tracing would be first-class

  • Correlation IDs everywhere

  • Pipeline-level dashboards from day one

Debugging async systems without visibility is archaeology.


Reflection Is Part of Engineering

The biggest takeaway isn’t about Redis, RabbitMQ, or .NET.

It’s this:

Architecture is not about being right.
It’s about being able to change without fear.

The CSL Datahub worked not because it was perfect, but because it respected:

  • Boundaries

  • Ownership

  • Failure

  • Time

Those principles outlast tools.


Closing the Case Study (And the Series)

This article closes the CSL case study—but not the conversation.

If there’s one thing worth carrying forward, it’s this mental shift:
design systems as conversations, not call graphs.

When systems speak in facts, tolerate delay, and respect ownership, they scale—not just technically, but humanly.

That’s what Datahub architecture is really about.

And that’s the kind of architecture worth building.

Optional Extra Articles

If you’d like to go deeper, the following optional articles explore specific corners of the architecture—contracts, failure handling, observability, tooling trade-offs, and scaling pressure points that deserve their own focused discussion.

🧾 Event Contracts as APIs

♻️ Dead Letter Queues and Retry Strategies

🔍 Observability for Event-Driven Systems

⚖️ Redis Streams vs Kafka: Choosing the Right Event Backbone

🚦 When the Processor Becomes a Bottleneck

Designing a Microservice-Friendly Datahub

Part 6 of 22

A series on microservice-friendly Datahub architecture, covering event-driven principles, decoupling, diving in real-world implementation with Redis, RabbitMQ, REST API, and processor service showing distributed systems communicate at scale.

Up next

End-to-End Data Flow Scenarios in Datahub

Tracing real events through a distributed Datahub system