End-to-End Data Flow Scenarios in Datahub
Tracing real events through a distributed Datahub system

Series: Designing a Microservice-Friendly Datahub
PART III — CASE STUDY: MY CSL DATAHUB IMPLEMENTATION
Previous: RabbitMQ as the Inter-Module Backbone in Datahub
Next: Lessons Learned and Future Improvements for my implemented Datahub
Up to this point, we’ve examined each architectural component in isolation. Now it’s time to do the only thing that really proves an architecture works: watch it move.
This article walks through three real, end-to-end scenarios—triggered by users, schedules, and external systems—to show how data flows through the CSL Datahub. The goal is not to impress with complexity, but to demonstrate predictability: how events are born, buffered, translated, routed, and consumed without tight coupling or fragile coordination.
Disclaimer (Context & NDA)
The CSL Datahub implementation described here was designed and built in 2021. While the architectural principles remain valid, some implementation details could be modernized today. To comply with NDA requirements, business-specific logic, schemas, identifiers, and workflows are intentionally generalized.
The Event Lifecycle (A Quick Primer)
Before diving into scenarios, it helps to establish the shared lifecycle every event follows:
State changes in the CSL Web App
An event is emitted to Redis Streams
The event is buffered
The Processor consumes and translates it
The event is published to RabbitMQ
One or more modules consume and react
Nothing skips steps. Nothing shortcuts the pipeline.

That consistency is what makes the system understandable.
Scenario 1: User-Triggered Update
The Trigger
A user updates their profile through the CSL Web App UI.
Step 1: State Change in the Web App
$user->display_name = $input['display_name'];
$user->updated_at = time();
$user->save();
This is the only place where authoritative state is modified.
Step 2: Emit an Event
$redis->xAdd(
'csl:events',
'*',
[
'type' => 'user.updated',
'user_id' => $user->id,
'occurred_at' => time()
]
);
This event does not ask for anything. It simply states a fact.
Step 3: Redis Buffers the Event
If downstream systems are slow or unavailable, the event sits safely in Redis Streams. The Web App does not wait.
Step 4: Processor Consumes and Translates
var entries = redis.StreamReadGroup(
"csl-group",
"processor-1",
"csl:events",
">"
);
foreach (var entry in entries)
{
PublishToRabbitMQ(Map(entry));
redis.StreamAcknowledge("csl:events", "csl-group", entry.Id);
}
Step 5: RabbitMQ Distributes
channel.BasicPublish(
exchange: "csl.events",
routingKey: "user.updated",
body: payload
);
Step 6: Modules React Independently
Notification service sends an email
Analytics service updates counters
Search indexer refreshes a document
None of these modules know about each other. None block the user.
Scenario 2: Cron-Triggered Update
The Trigger
A scheduled job recalculates account status nightly.
Step 1: Cron Runs Inside the Web App
foreach ($users as $user) {
$user->status = calculateStatus($user);
$user->save();
$redis->xAdd(
'csl:events',
'*',
[
'type' => 'user.status_recalculated',
'user_id' => $user->id,
'occurred_at' => time()
]
);
}
Notice what doesn’t change:
Same database
Same event emission
Same pipeline
Cron jobs are just another source of facts.
Step 2: Burst Handling via Redis Streams
Hundreds or thousands of events may be emitted in a short time. Redis absorbs the burst. Consumers catch up at their own pace.
Step 3: Downstream Effects
Billing service recalculates charges
Compliance service flags anomalies
Reporting service aggregates metrics
Some consumers may lag. Others may fail and retry. None affect the cron job’s completion.
Scenario 3: External Module–Triggered Update
The Trigger
An external module completes a long-running process and reports back.
Step 1: External Module Publishes an Event
channel.publish(
'external.events',
'process.completed',
Buffer.from(JSON.stringify({
processId: 'abc123',
status: 'success'
}))
);
Step 2: Processor Consumes from RabbitMQ (calling Web App REST API)
consumer.Received += async (sender, ea) =>
{
var message = Deserialize(ea.Body);
await httpClient.PostAsync(
"/api/csl/process-complete",
new StringContent(Json(message))
);
channel.BasicAck(ea.DeliveryTag, false);
};
Step 3: CSL Web App Updates State
$model = Process::findOne($processId);
$model->status = 'completed';
$model->save();
Step 4: CSL Emits a New Event
$redis->xAdd(
'csl:events',
'*',
[
'type' => 'process.completed',
'process_id' => $processId,
'occurred_at' => time()
]
);
The cycle continues. No shortcuts. No special cases.
Idempotency in Practice
In all three scenarios, duplication is possible—and expected.
Consumers defend themselves:
if (processedEventIds.has(event.id)) {
return;
}
processedEventIds.add(event.id);
Or via database constraints:
INSERT INTO processed_events (event_id)
VALUES (:event_id)
ON DUPLICATE KEY UPDATE event_id = event_id;
Idempotency is what turns retries into reliability.
What These Scenarios Prove
Across all flows:
The Web App never waits for downstream systems
Events are facts, not instructions
Buffers absorb time and failure
Modules react independently
The system converges, even under stress

This is not accidental. It’s the result of respecting boundaries.
Seeing the System as Motion, Not Structure
Once you follow real events end-to-end, the architecture stops being a diagram and starts being a machine.
You can reason about:
Where delays occur
Where failures are isolated
Where retries are safe
Where ownership lives

That’s the difference between knowing an architecture and trusting it.
Where We Go Next
Now that we’ve seen the CSL Datahub in motion, the final step is reflection.
In the next and final article, Lessons Learned and Future Improvements, we’ll step back to examine what worked well, what was harder than expected, and how this architecture could evolve over time—technically and organizationally.
Good architectures don’t just make systems work.
They make systems understandable while they work.






