Skip to main content

Command Palette

Search for a command to run...

Caching in the BFF: In-Memory, Redis & Response Caching

Where caching belongs in a BFF architecture, how to avoid stale-data bugs, and cache invalidation patterns.

Published
18 min read
Caching in the BFF: In-Memory, Redis & Response Caching

Caching is one of the most compelling arguments for the BFF pattern — and one of the fastest ways to introduce subtle, hard-to-diagnose bugs if it is implemented without precision. A BFF that aggregates four upstream services on every request has an obvious caching opportunity: responses that are expensive to assemble and infrequently changing can be served from memory rather than recomputed. But "infrequently changing" is doing significant work in that sentence, and the failure modes that emerge when caching is applied too broadly are worse than the latency it was meant to solve.

This article covers where caching belongs in a BFF, the three caching strategies available in .NET Core and when each is appropriate, the stale-data problems that appear in each strategy, and the cache invalidation patterns that keep the system correct under real-world usage. Code examples are grounded in the education platform BFF built throughout this series.


Where caching belongs — and where it does not

Before any implementation, one architectural question determines whether your caching strategy will stay maintainable: is the data being cached user-specific, or is it shared across users?

This distinction divides the caching problem into two fundamentally different problems that require different solutions.

Shared data is the same for all users or all users within an organisation: the list of available courses for an institution, the current academic calendar, lookup tables for status codes and role descriptions. This data can be cached at the BFF level with a single cache entry serving thousands of requests.

User-specific data is different for every authenticated user: a student's enrolled courses, a teacher's upcoming sessions, a user's unread notifications. Caching this data requires a cache key that includes the user's identity. Serving one user's data to another user is a security vulnerability, not a performance optimisation.

The production system this series is based on had both types. Course catalogue data (available courses for an institution) was shared and changed at most once per day. User enrollment status was user-specific and could change within a single session if a student enrolled or withdrew. These two types required different cache strategies with different TTLs and different invalidation approaches.

A third category exists: request-level aggregation results — data that is expensive to compute for a single request but is neither shared across users nor needed beyond the current response. For this, caching is the wrong tool; parallel upstream calls (as implemented in Article 4) are the right tool. Not every performance problem is a caching problem.


The three caching strategies

.NET Core provides three caching mechanisms with distinct characteristics. A production BFF typically uses all three in different parts of the stack.

  1. IMemoryCache — in-process memory cache IMemoryCache stores data in the BFF process's heap. It is the fastest cache available — no serialisation, no network hop — but its data is local to a single container instance and is lost on restart. Appropriate for: shared reference data with low invalidation requirements (course catalogues, academic year data, role lookup tables), in a single-instance deployment. Not appropriate for: user-specific data (cache keys must be per-user, which increases memory pressure significantly at scale), data that must be consistent across multiple BFF instances.

  2. IDistributedCache with Redis — out-of-process distributed cache IDistributedCache backed by Redis stores serialised data outside the BFF process. Data survives container restarts, is shared across multiple BFF instances, and can be inspected and cleared independently of the BFF. Appropriate for: user-specific data at scale, shared data in multi-instance deployments, data that must survive BFF restarts. Not appropriate for: data that changes faster than the round-trip cost to Redis justifies caching (for very short TTLs, the Redis latency can approach the upstream call latency).

  3. Response caching middleware — HTTP-level cache headers Response caching stores entire HTTP responses — headers and body — and serves them for subsequent requests that match the caching rules. It operates at the HTTP layer and can be combined with a CDN or reverse proxy that respects Cache-Control headers. Appropriate for: unauthenticated or publicly accessible BFF endpoints, endpoints returning shared data where HTTP cache semantics (ETags, If-None-Match) add value. Not appropriate for: authenticated endpoints — Cache-Control: private prevents proxy caching but also limits the usefulness of the middleware itself. For most BFF endpoints behind authentication, response caching at the HTTP layer adds complexity without benefit over IMemoryCache or Redis.


Implementation: IMemoryCache for shared reference data

The course catalogue for an institution — the list of available courses — changes at most once per day in the education platform. Fetching it on every dashboard request is wasteful. It is shared across all users in the same organisation, making it a natural candidate for IMemoryCache.

// Infrastructure/CourseCache.cs
public sealed class CourseCache(
    IMemoryCache cache,
    CourseServiceClient courseClient,
    ILogger<CourseCache> logger)
{
    private static string CacheKey(string orgId) => $"courses:org:{orgId}";

    private static readonly MemoryCacheEntryOptions CacheOptions = new MemoryCacheEntryOptions()
        .SetAbsoluteExpiration(TimeSpan.FromMinutes(30))
        .SetSlidingExpiration(TimeSpan.FromMinutes(10))
        .SetSize(1); // For size-limited cache — each entry counts as 1 unit

    public async Task<IReadOnlyList<CourseDto>?> GetOrFetchAsync(
        string orgId, CancellationToken ct = default)
    {
        var key = CacheKey(orgId);

        if (cache.TryGetValue(key, out IReadOnlyList<CourseDto>? cached))
        {
            logger.LogDebug(
                "Cache hit for courses. OrgId: {OrgId}, Key: {CacheKey}", orgId, key);
            return cached;
        }

        logger.LogDebug(
            "Cache miss for courses. OrgId: {OrgId}, fetching upstream.", orgId);

        var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);

        if (courses is not null)
        {
            cache.Set(key, courses, CacheOptions);
            logger.LogInformation(
                "Cached {CourseCount} courses for org {OrgId}. TTL: 30 min.",
                courses.Count, orgId);
        }

        return courses;
    }

    public void Invalidate(string orgId)
    {
        var key = CacheKey(orgId);
        cache.Remove(key);
        logger.LogInformation(
            "Cache invalidated for courses. OrgId: {OrgId}", orgId);
    }
}

Register and size the cache in Program.cs:

// Program.cs
builder.Services.AddMemoryCache(opts =>
{
    // Limit the cache to 100MB — prevents unbounded memory growth
    // as the number of institutions served grows
    opts.SizeLimit = 1024; // 1024 units — each entry is 1 unit above
    opts.CompactionPercentage = 0.25; // Remove 25% of entries when limit is hit
    opts.TrackStatistics = true; // Enables cache hit/miss metrics
});

builder.Services.AddSingleton<CourseCache>();

The SetSlidingExpiration combined with SetAbsoluteExpiration creates a TTL behaviour: entries expire 30 minutes after creation regardless of access frequency, but within that window they expire 10 minutes after the last access. For course data that changes on a daily cycle, this means a moderately active institution's data stays cached; an institution with infrequent users (a small school) does not hold stale data indefinitely.

Avoiding the thundering herd

When the cache for a high-traffic organisation expires, multiple concurrent requests will all get cache misses simultaneously and all make the upstream call at the same time. This is the thundering herd problem, and it is particularly acute at startup when the cache is cold.

The fix is a concurrent dictionary as a request coalescer:

// Infrastructure/CourseCache.cs — updated with request coalescing
public sealed class CourseCache(
    IMemoryCache cache,
    CourseServiceClient courseClient,
    ILogger<CourseCache> logger)
{
    // Track in-flight fetch operations — concurrent requests share one Task
    private readonly ConcurrentDictionary<string, Lazy<Task<IReadOnlyList<CourseDto>?>>>
        _inFlight = new();

    public async Task<IReadOnlyList<CourseDto>?> GetOrFetchAsync(
        string orgId, CancellationToken ct = default)
    {
        var key = CacheKey(orgId);

        if (cache.TryGetValue(key, out IReadOnlyList<CourseDto>? cached))
            return cached;

        // Coalesce concurrent cache misses — only one upstream call per key
        var lazyFetch = _inFlight.GetOrAdd(key,
            _ => new Lazy<Task<IReadOnlyList<CourseDto>?>>(
                () => FetchAndCacheAsync(orgId, key, ct)));

        try
        {
            return await lazyFetch.Value;
        }
        finally
        {
            _inFlight.TryRemove(key, out _);
        }
    }

    private async Task<IReadOnlyList<CourseDto>?> FetchAndCacheAsync(
        string orgId, string key, CancellationToken ct)
    {
        var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);
        if (courses is not null)
            cache.Set(key, courses, CacheOptions);
        return courses;
    }
}

Implementation: Redis for user-specific data

User enrollment status — which courses a specific student or teacher is actively engaged with — changes within a session. A student who enrolls in a course should see that course on their next dashboard load. This rules out in-memory caching with a long TTL, but does not rule out caching entirely — a short TTL with explicit invalidation on write operations provides freshness guarantees while reducing upstream load.

Install the Redis client:

dotnet add package Microsoft.Extensions.Caching.StackExchangeRedis
dotnet add package StackExchange.Redis

Configure in Program.cs:

builder.Services.AddStackExchangeRedisCache(opts =>
{
    opts.Configuration = builder.Configuration["Redis:ConnectionString"];
    opts.InstanceName  = "bff:"; // Prefix all keys — prevents collision with other services
});

The enrollment cache wrapper:

// Infrastructure/EnrollmentCache.cs
public sealed class EnrollmentCache(
    IDistributedCache cache,
    ILogger<EnrollmentCache> logger)
{
    private static string UserKey(string userId) => $"enrollment:{userId}";

    // Short TTL — enrollment status is user-specific and mutable within a session
    private static readonly DistributedCacheEntryOptions Options =
        new DistributedCacheEntryOptions()
            .SetAbsoluteExpirationRelativeToNow(TimeSpan.FromMinutes(5));

    public async Task<EnrollmentStatusDto?> GetAsync(
        string userId, CancellationToken ct = default)
    {
        var key  = UserKey(userId);
        var data = await cache.GetStringAsync(key, ct);

        if (data is null)
        {
            logger.LogDebug("Redis cache miss for enrollment. UserId: {UserId}", userId);
            return null;
        }

        logger.LogDebug("Redis cache hit for enrollment. UserId: {UserId}", userId);
        return JsonSerializer.Deserialize<EnrollmentStatusDto>(data);
    }

    public async Task SetAsync(
        string userId, EnrollmentStatusDto status, CancellationToken ct = default)
    {
        var key  = UserKey(userId);
        var data = JsonSerializer.Serialize(status);
        await cache.SetStringAsync(key, data, Options, ct);
    }

    public async Task InvalidateAsync(string userId, CancellationToken ct = default)
    {
        var key = UserKey(userId);
        await cache.RemoveAsync(key, ct);
        logger.LogInformation(
            "Redis cache invalidated for enrollment. UserId: {UserId}", userId);
    }
}

Cache-aside pattern in the aggregator

The aggregator uses the cache in a cache-aside pattern: check the cache first; on miss, fetch from upstream and populate the cache; on invalidation events, remove the entry.

// Aggregators/CourseAggregator.cs
public sealed class CourseAggregator(
    CourseCache courseCache,
    EnrollmentCache enrollmentCache,
    CourseServiceClient courseClient,
    ILogger<CourseAggregator> logger)
{
    public async Task<CourseDetailResponse> GetCourseDetailAsync(
        string courseId, string userId, CancellationToken ct = default)
    {
        // Try enrollment status from Redis cache first
        var enrollmentStatus = await enrollmentCache.GetAsync(userId, ct);

        if (enrollmentStatus is null)
        {
            logger.LogDebug(
                "Enrollment cache miss for user {UserId}, fetching upstream.", userId);
            var upstreamEnrollment = await courseClient
                .GetEnrollmentStatusAsync(userId, ct);

            if (upstreamEnrollment is not null)
            {
                enrollmentStatus = upstreamEnrollment;
                await enrollmentCache.SetAsync(userId, enrollmentStatus, ct);
            }
        }

        // Course detail from memory cache
        var courseDetail = await courseCache.GetOrFetchAsync(courseId, ct);

        return ShapeCourseDetail(courseDetail, enrollmentStatus);
    }
}

Invalidating on write: enrollment changes

When a user enrolls in or withdraws from a course, the BFF handles the POST/DELETE request. The enrollment cache must be invalidated immediately after the upstream write succeeds:

// Endpoints/EnrollmentEndpoints.cs
private static async Task<IResult> EnrollAsync(
    string courseId,
    HttpContext ctx,
    CourseServiceClient courseClient,
    EnrollmentCache enrollmentCache,
    CancellationToken ct)
{
    var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier)!;

    var result = await courseClient.EnrollAsync(courseId, userId, ct);
    if (result is null)
        return Results.Problem(
            detail: "Enrollment request could not be processed.",
            statusCode: StatusCodes.Status502BadGateway);

    // Invalidate cache immediately after successful write
    // The next GET will fetch fresh data from upstream
    await enrollmentCache.InvalidateAsync(userId, ct);

    return Results.Ok(result);
}

The invalidation-on-write pattern is simple and correct for a BFF architecture. It avoids the complexity of cache-through (writing to the cache as part of the write operation) and cache-update (updating the cached value without an upstream round-trip). Both of those patterns require the cache entry to always be in a valid state relative to the upstream — which is hard to guarantee when the upstream can be modified by systems other than the BFF.


The stale-data problems to watch for

1. Multiple BFF instances with in-memory caches

This is the most common caching mistake in a BFF deployed to multiple container instances. Two instances of the BFF each hold their own in-memory cache. Instance A's cache is invalidated; Instance B's cache still holds the stale entry. A user's requests can round-robin between instances — they see fresh data, then stale data, then fresh data again.

The fix is architectural: in-memory caches should only hold data that is either truly immutable (lookup tables, static reference data that never changes within a deployment lifetime) or whose staleness is acceptable and bounded (data with a short TTL where a brief period of inconsistency between instances is acceptable). User-specific data and data with write-based invalidation must use a distributed cache.

In the production system, the single ACI container instance made this a non-issue during initial deployment. The in-memory course cache was fine for one instance. When the architecture was being designed for potential multi-instance scaling, the course cache was moved to Redis — not because the single instance was problematic, but because the coupling between the caching strategy and the deployment topology was too tight.

2. TTL longer than the upstream change frequency

A 30-minute TTL on course catalogue data is reasonable when courses change at most once per day. If the product owner changes a course's enrollment capacity during an active session, users will see the old capacity for up to 30 minutes. This is an acceptable trade-off only if it has been explicitly agreed with the product team.

The conversation that needs to happen: "We cache course data for 30 minutes to reduce load on the course service. This means a course change takes up to 30 minutes to appear for users. Is that acceptable, or do we need to implement a cache invalidation webhook?" Not having this conversation leads to support tickets about "the system not updating."

In the production system, the course catalogue TTL was set to 30 minutes after explicit agreement with the product team. A cache invalidation endpoint (covered below) was added in the second month when an administrator needed to force an immediate refresh.

3. Cache key collisions

A cache key that does not include all dimensions of uniqueness produces collisions. The most dangerous variant in a BFF is a key that omits the user ID or organisation ID.

// ✗ Dangerous — same key for all users
var key = $"dashboard:courses";

// ✗ Still dangerous — same key for all users in a session
var key = $"dashboard:courses:session";

// ✓ Correct — unique per organisation
var key = $"courses:org:{orgId}";

// ✓ Correct — unique per user
var key = $"enrollment:{userId}";

A cache key collision between two organisations is a data leak. Organisation A's course list is returned to Organisation B's users. In the production system, every cache key was reviewed against the principle: "if two different users made this same request with different identities, would this key produce different results?" If yes, both identities must be in the key.

4. Caching error responses

A cache implementation that stores null results from upstream failures and serves them as cache hits is caching the absence of data as if it were data. The fix is explicit:

// Always check whether upstream returned a valid result before caching
var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);

if (courses is not null) // Only cache successful responses
{
    cache.Set(key, courses, CacheOptions);
}
// Do not cache null — let the next request try the upstream again
return courses;

This is why the CourseCache above only calls cache.Set inside the if (courses is not null) guard. Caching a null result from a temporarily unavailable service would mean every request for the next 30 minutes is served a null cache hit — extending the upstream outage far beyond its actual duration from the user's perspective.


Cache invalidation patterns

Cache invalidation is famously hard. The BFF has three practical patterns, each appropriate to different scenarios.

Pattern 1: TTL-based expiry (simplest, eventual consistency)

Set a TTL and accept the staleness window. Appropriate for shared reference data where bounded staleness is acceptable and invalidation events cannot be reliably detected.

// 30-minute TTL — course catalogue for an organisation
cache.Set(key, courses, new MemoryCacheEntryOptions()
    .SetAbsoluteExpiration(TimeSpan.FromMinutes(30)));

Advantages: no coordination required, predictable behaviour, simple to reason about. Disadvantages: staleness window can violate product expectations; TTL must be agreed with the product team.

Pattern 2: Write-through invalidation (most reliable for user-specific data)

Invalidate the cache entry immediately after any write operation that changes the cached data. The BFF already handles writes; the invalidation call is a single line after a successful upstream write.

Advantages: immediate consistency after writes, no separate invalidation infrastructure. Disadvantages: only covers writes that go through the BFF. If upstream data changes via a different path (direct API call, admin tool, another service), the BFF cache is not notified.

Pattern 3: Invalidation endpoint (for external events)

Expose a cache invalidation endpoint on the BFF that upstream services or admin tools can call when data changes. This is the pattern for "the upstream can change without going through the BFF."

// Endpoints/CacheEndpoints.cs
public static class CacheEndpoints
{
    public static IEndpointRouteBuilder MapCacheEndpoints(
        this IEndpointRouteBuilder app)
    {
        // Protected by an API key — not exposed to authenticated users
        app.MapPost("/internal/cache/invalidate/org/{orgId}",
            async (string orgId, CourseCache courseCache,
                   HttpContext ctx, IConfiguration config) =>
        {
            // Validate internal API key — this endpoint is not user-facing
            var apiKey = ctx.Request.Headers["X-Internal-Api-Key"].FirstOrDefault();
            if (apiKey != config["InternalApi:Key"])
                return Results.Unauthorized();

            courseCache.Invalidate(orgId);
            return Results.Ok(new { invalidated = true, orgId });
        })
        .WithName("InvalidateCourseCache")
        .ExcludeFromDescription(); // Do not expose in OpenAPI spec

        return app;
    }
}

In the production system, this endpoint was called by an administrative management tool when an institution administrator updated the course catalogue. The management tool made a POST to /internal/cache/invalidate/org/{orgId} after its upstream write completed. The BFF's cache was cleared; the next user request fetched fresh data.

The endpoint is protected by a static API key rather than the standard Feide authentication — it is an internal system-to-system call, not a user action. The key is injected as a secret environment variable, identical to the Feide client secret in Article 7.


Measuring cache effectiveness

Cache effectiveness is not self-evident without metrics. The production system tracked three numbers:

Hit rate by cache layer. For IMemoryCache, the MemoryCache.GetCurrentStatistics() API provides hit count and miss count. Log these periodically:

// Background service — logs cache stats every 5 minutes
public sealed class CacheMetricsService(
    IMemoryCache cache,
    TelemetryClient telemetry,
    ILogger<CacheMetricsService> logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(TimeSpan.FromMinutes(5), ct);

            var stats = (cache as MemoryCache)?.GetCurrentStatistics();
            if (stats is null) continue;

            var hitRate = stats.TotalHits + stats.TotalMisses > 0
                ? (double)stats.TotalHits / (stats.TotalHits + stats.TotalMisses) * 100
                : 0;

            logger.LogInformation(
                "MemoryCache stats — Hits: {Hits}, Misses: {Misses}, " +
                "HitRate: {HitRate:F1}%, Entries: {EntryCount}, " +
                "EstimatedSize: {Size}",
                stats.TotalHits, stats.TotalMisses,
                hitRate, stats.CurrentEntryCount, stats.CurrentEstimatedSize);

            telemetry.GetMetric("Cache.HitRate").TrackValue(hitRate);
            telemetry.GetMetric("Cache.EntryCount").TrackValue(stats.CurrentEntryCount);
        }
    }
}

Upstream call reduction. Compare the number of BFF requests for course data with the number of actual calls the CourseServiceClient makes. The ratio should match the expected cache hit rate. If the BFF receives 500 dashboard requests per hour for an organisation and the course service receives 500 calls, the cache is not working. If it receives 12 calls (one per 30-minute TTL window during business hours), it is.

Latency delta between cache hits and misses. Log the cache outcome (hit/miss) alongside the aggregation duration in the DashboardAggregationCompleted telemetry event from Article 9. A Kusto query that splits duration by cache outcome reveals the actual latency benefit the cache provides — which is the number that justifies its complexity.


When to remove caching

Caching is sometimes introduced to mask a performance problem that should be fixed at the source. Before adding a cache layer, consider whether the upstream service is slow because it is under-indexed, under-resourced, or architecturally misdesigned — and whether fixing the source would make the cache unnecessary.

The production system removed the enrollment status cache in its second month. The upstream enrollment service had been slow due to a missing database index. After the index was added, the p95 response time dropped from 400ms to 18ms. At 18ms, the 5-minute Redis TTL on enrollment status introduced more staleness risk than the latency it saved. The cache was removed; the enrollment service was called directly on every request.

This is the correct outcome. A cache that is no longer needed is a cache that is no longer producing subtle bugs. The willingness to remove caching when the underlying problem is solved is as important as the willingness to add it when it is genuinely needed.


☰ Series navigation

The Frontend's Contract: Building Backends for Frontends

Part 3 of 13

A practitioner's guide to the BFF pattern — from architectural rationale to production-grade implementation. Covers when BFF earns its complexity, how to design a clean client-specific API layer, and what it takes to run it reliably on Azure. Stack: Vue 3 · .NET Core 8+ · Azure.

Up next

Observability for BFF: Structured Logging, Distributed Tracing & Azure Application Insights

End-to-end traceability across Vue → BFF → upstream services using Azure Application Insights. Correlation IDs, structured logs with Serilog, custom telemetry, and Application Insights dashboards and alerts.