Skip to main content

Command Palette

Search for a command to run...

End-to-End Data Flow Scenarios in Datahub

Tracing real events through a distributed Datahub system

Updated
5 min read
End-to-End Data Flow Scenarios in Datahub
N
Senior-level Fullstack Web Developer with 10+ years experience, including 2 years of Team Lead position. Specializing in responsive design and full-stack web development across the Vue.js and .NET ecosystems. Skilled in Azure/AWS cloud infrastructure, focused on DevOps techniques such as CI/CD. Experienced in system design, especially with software architecture patterns such as microservices, BFF (backend-for-frontend). Hands-on with Agile practices in team leading, and AI-assisted coding.

Series: Designing a Microservice-Friendly Datahub
PART III — CASE STUDY: MY CSL DATAHUB IMPLEMENTATION
Previous: RabbitMQ as the Inter-Module Backbone in Datahub
Next: Lessons Learned and Future Improvements for my implemented Datahub

Up to this point, we’ve examined each architectural component in isolation. Now it’s time to do the only thing that really proves an architecture works: watch it move.

This article walks through three real, end-to-end scenarios—triggered by users, schedules, and external systems—to show how data flows through the CSL Datahub. The goal is not to impress with complexity, but to demonstrate predictability: how events are born, buffered, translated, routed, and consumed without tight coupling or fragile coordination.

Disclaimer (Context & NDA)
The CSL Datahub implementation described here was designed and built in 2021. While the architectural principles remain valid, some implementation details could be modernized today. To comply with NDA requirements, business-specific logic, schemas, identifiers, and workflows are intentionally generalized.


The Event Lifecycle (A Quick Primer)

Before diving into scenarios, it helps to establish the shared lifecycle every event follows:

  1. State changes in the CSL Web App

  2. An event is emitted to Redis Streams

  3. The event is buffered

  4. The Processor consumes and translates it

  5. The event is published to RabbitMQ

  6. One or more modules consume and react

Nothing skips steps. Nothing shortcuts the pipeline.

That consistency is what makes the system understandable.


Scenario 1: User-Triggered Update

The Trigger

A user updates their profile through the CSL Web App UI.

Step 1: State Change in the Web App

$user->display_name = $input['display_name'];
$user->updated_at = time();
$user->save();

This is the only place where authoritative state is modified.

Step 2: Emit an Event

$redis->xAdd(
    'csl:events',
    '*',
    [
        'type' => 'user.updated',
        'user_id' => $user->id,
        'occurred_at' => time()
    ]
);

This event does not ask for anything. It simply states a fact.

Step 3: Redis Buffers the Event

If downstream systems are slow or unavailable, the event sits safely in Redis Streams. The Web App does not wait.

Step 4: Processor Consumes and Translates

var entries = redis.StreamReadGroup(
    "csl-group",
    "processor-1",
    "csl:events",
    ">"
);

foreach (var entry in entries)
{
    PublishToRabbitMQ(Map(entry));
    redis.StreamAcknowledge("csl:events", "csl-group", entry.Id);
}

Step 5: RabbitMQ Distributes

channel.BasicPublish(
    exchange: "csl.events",
    routingKey: "user.updated",
    body: payload
);

Step 6: Modules React Independently

  • Notification service sends an email

  • Analytics service updates counters

  • Search indexer refreshes a document

None of these modules know about each other. None block the user.


Scenario 2: Cron-Triggered Update

The Trigger

A scheduled job recalculates account status nightly.

Step 1: Cron Runs Inside the Web App

foreach ($users as $user) {
    $user->status = calculateStatus($user);
    $user->save();

    $redis->xAdd(
        'csl:events',
        '*',
        [
            'type' => 'user.status_recalculated',
            'user_id' => $user->id,
            'occurred_at' => time()
        ]
    );
}

Notice what doesn’t change:

  • Same database

  • Same event emission

  • Same pipeline

Cron jobs are just another source of facts.

Step 2: Burst Handling via Redis Streams

Hundreds or thousands of events may be emitted in a short time. Redis absorbs the burst. Consumers catch up at their own pace.

Step 3: Downstream Effects

  • Billing service recalculates charges

  • Compliance service flags anomalies

  • Reporting service aggregates metrics

Some consumers may lag. Others may fail and retry. None affect the cron job’s completion.


Scenario 3: External Module–Triggered Update

The Trigger

An external module completes a long-running process and reports back.

Step 1: External Module Publishes an Event

channel.publish(
  'external.events',
  'process.completed',
  Buffer.from(JSON.stringify({
    processId: 'abc123',
    status: 'success'
  }))
);

Step 2: Processor Consumes from RabbitMQ (calling Web App REST API)

consumer.Received += async (sender, ea) =>
{
    var message = Deserialize(ea.Body);

    await httpClient.PostAsync(
        "/api/csl/process-complete",
        new StringContent(Json(message))
    );

    channel.BasicAck(ea.DeliveryTag, false);
};

Step 3: CSL Web App Updates State

$model = Process::findOne($processId);
$model->status = 'completed';
$model->save();

Step 4: CSL Emits a New Event

$redis->xAdd(
    'csl:events',
    '*',
    [
        'type' => 'process.completed',
        'process_id' => $processId,
        'occurred_at' => time()
    ]
);

The cycle continues. No shortcuts. No special cases.


Idempotency in Practice

In all three scenarios, duplication is possible—and expected.

Consumers defend themselves:

if (processedEventIds.has(event.id)) {
  return;
}
processedEventIds.add(event.id);

Or via database constraints:

INSERT INTO processed_events (event_id)
VALUES (:event_id)
ON DUPLICATE KEY UPDATE event_id = event_id;

Idempotency is what turns retries into reliability.


What These Scenarios Prove

Across all flows:

  • The Web App never waits for downstream systems

  • Events are facts, not instructions

  • Buffers absorb time and failure

  • Modules react independently

  • The system converges, even under stress

This is not accidental. It’s the result of respecting boundaries.


Seeing the System as Motion, Not Structure

Once you follow real events end-to-end, the architecture stops being a diagram and starts being a machine.

You can reason about:

  • Where delays occur

  • Where failures are isolated

  • Where retries are safe

  • Where ownership lives

That’s the difference between knowing an architecture and trusting it.


Where We Go Next

Now that we’ve seen the CSL Datahub in motion, the final step is reflection.

In the next and final article, Lessons Learned and Future Improvements, we’ll step back to examine what worked well, what was harder than expected, and how this architecture could evolve over time—technically and organizationally.

Good architectures don’t just make systems work.
They make systems understandable while they work.

Designing a Microservice-Friendly Datahub

Part 7 of 22

A series on microservice-friendly Datahub architecture, covering event-driven principles, decoupling, diving in real-world implementation with Redis, RabbitMQ, REST API, and processor service showing distributed systems communicate at scale.

Up next

RabbitMQ as the Inter-Module Backbone in Datahub

How RabbitMQ enables decentralized communication between modules