<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Devpath Traveler]]></title><description><![CDATA[This blog is about building software the right way. A blog about the tales of a traveler on the developing path.
Not just how to write code—but how to design systems that make sense, scale well, and don’t turn into technical debt later. You’ll find practical insights on architecture, real-world trade-offs, and clean implementation using modern tools.
Clear thinking. Real examples. Long-term mindset.]]></description><link>https://devpath-traveler.nguyenviettung.id.vn</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1764903550013/07b702d5-e5d2-4409-8ec4-37670bebea31.png</url><title>Devpath Traveler</title><link>https://devpath-traveler.nguyenviettung.id.vn</link></image><generator>RSS for Node</generator><lastBuildDate>Mon, 11 May 2026 18:53:40 GMT</lastBuildDate><atom:link href="https://devpath-traveler.nguyenviettung.id.vn/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Brownfield Migration: The Strangler Fig Approach to BFF Adoption]]></title><description><![CDATA[Every article in this series has described building a BFF on a greenfield system — a clean slate where the architecture is decided upfront and the frontend and BFF are developed in parallel. That is n]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/brownfield-migration-the-strangler-fig-approach-to-bff-adoption</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/brownfield-migration-the-strangler-fig-approach-to-bff-adoption</guid><category><![CDATA[strangler fig pattern]]></category><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[API migration]]></category><category><![CDATA[Incremental migration]]></category><category><![CDATA[Brownfield migration]]></category><category><![CDATA[Legacy API]]></category><category><![CDATA[FrontendArchitecture]]></category><category><![CDATA[migration strategy]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[Coexistence patterns]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 19 Apr 2026 04:52:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/992d1f4e-df9f-41d8-9259-49ecc3118642.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>Every article in this series has described building a BFF on a greenfield system — a clean slate where the architecture is decided upfront and the frontend and BFF are developed in parallel. That is not the situation most engineers face. Most engineers face an existing frontend that talks directly to an existing API, accumulated over years, with real users depending on it every day.</p>
<p>The Strangler Fig pattern is the answer for this situation. Named by Martin Fowler after a species of tree that grows around an existing structure and gradually replaces it, the pattern describes a migration strategy where the new system is built incrementally alongside the old one. Traffic is moved piece by piece — one endpoint at a time, one screen at a time — until the old system is no longer needed and can be removed. At no point is there a cutover where everything changes simultaneously. At no point are users exposed to an untested replacement of the entire system.</p>
<p>This article covers how to apply the Strangler Fig pattern specifically to BFF adoption: how to introduce the BFF as a routing layer in front of an existing API, how to migrate endpoints incrementally, how to manage the coexistence period without duplicating logic, and how to know when the migration is complete and the old API can be retired.</p>
<hr />
<h2>The starting point: what brownfield looks like</h2>
<p>Before the migration begins, a typical brownfield architecture looks like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/13778248-9f17-44db-bb18-1b7e421a3a9d.png" alt="" style="display:block;margin:0 auto" />

<p>The Vue application makes direct HTTP calls to the existing API. The existing API was not designed with the frontend's needs in mind — it exposes domain entities rather than screen-oriented responses, it handles authentication in a way that predates modern security practices, and its response shapes have accumulated inconsistencies over years of development.</p>
<p>The problems this causes are the same ones Article 1 described: overfetching, underfetching, adapter logic in the frontend, and no clean place to add cross-cutting concerns like session management or response caching. The question is how to introduce a BFF to solve these problems without rewriting the entire system or breaking the existing users while doing so.</p>
<hr />
<h2>The Strangler Fig approach, applied</h2>
<p>The migration proceeds in four stages. Each stage is independently deployable and leaves the system in a working state. There is no stage that must be completed before users can continue using the application.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/209afa55-8000-45eb-a305-e1ec47d8d7c2.png" alt="" style="display:block;margin:0 auto" />

<p>The key architectural enabler is that the BFF starts as a pass-through proxy. In Stage 1, every request from the Vue application goes to the BFF, which forwards it to the existing API without modification. No functionality changes. No user-visible behaviour changes. The BFF is in the path but does nothing yet. This is the foundation that makes every subsequent stage safe.</p>
<hr />
<h2>Stage 1: The transparent proxy</h2>
<p>The first deployment of the BFF does not aggregate, shape, or transform anything. It proxies every request to the existing API verbatim. The purpose of this stage is to establish the BFF in the request path, validate that it can handle the traffic without introducing latency or errors, and give the team confidence in the deployment and monitoring setup before any migration work begins.</p>
<h3>The YARP reverse proxy</h3>
<p>.NET has a first-class reverse proxy library — YARP (Yet Another Reverse Proxy) — that makes the transparent proxy stage straightforward to implement:</p>
<pre><code class="language-shell">dotnet add package Yarp.ReverseProxy
</code></pre>
<pre><code class="language-csharp">// Program.cs — Stage 1: transparent proxy
var builder = WebApplication.CreateBuilder(args);

builder.Services
    .AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

// Observability — even in proxy mode, every request should be traced
builder.Host.UseSerilog((ctx, cfg) =&gt; cfg
    .ReadFrom.Configuration(ctx.Configuration)
    .Enrich.FromLogContext()
    .Enrich.WithProperty("Service", "bff")
    .WriteTo.Console()
    .WriteTo.ApplicationInsights(
        ctx.Configuration["ApplicationInsights:ConnectionString"],
        TelemetryConverter.Traces));

builder.Services.AddApplicationInsightsTelemetry();

var app = builder.Build();

app.UseMiddleware&lt;CorrelationIdMiddleware&gt;();
app.UseSerilogRequestLogging();

// All traffic proxied to existing API
app.MapReverseProxy();

app.Run();
</code></pre>
<p>The YARP configuration in <code>appsettings.json</code>:</p>
<pre><code class="language-json">{
  "ReverseProxy": {
    "Routes": {
      "catch-all": {
        "ClusterId": "existing-api",
        "Match": {
          "Path": "{**catch-all}"
        }
      }
    },
    "Clusters": {
      "existing-api": {
        "Destinations": {
          "primary": {
            "Address": "https://api.existingplatform.no"
          }
        },
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:30",
            "Timeout": "00:00:05",
            "Path": "/health"
          }
        }
      }
    }
  }
}
</code></pre>
<p>The Vue application's API base URL is changed from <code>https://api.existingplatform.no</code> to <code>https://bff.existingplatform.no</code>. Everything else stays the same. The BFF forwards every request to the existing API and returns the response unchanged.</p>
<h3>What to validate in Stage 1</h3>
<p>Before proceeding to Stage 2, validate three things:</p>
<p><strong>Latency.</strong> The BFF adds one network hop. In Application Insights, compare the p95 latency of the same endpoints before and after the BFF was introduced. The acceptable overhead is typically under 10ms for same-region deployments. More than that indicates a deployment topology or network configuration issue that must be resolved before migration work begins — latency introduced by the proxy will compound with latency introduced by aggregation logic.</p>
<p><strong>Error rate.</strong> The error rate for all endpoints should be identical before and after the BFF introduction. Any increase in 4xx or 5xx rates is a bug in the proxy configuration.</p>
<p><strong>Authentication passthrough.</strong> If the existing API uses authentication (bearer tokens, cookies, API keys), verify that the BFF passes the credentials through correctly. YARP preserves request headers by default, but verify this with an authenticated endpoint before proceeding.</p>
<p>Stage 1 can run for days or weeks before Stage 2 begins. There is no urgency to migrate — the system is working, users are unaffected, and the team is learning how the traffic actually looks before beginning to intercept it.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/212367b7-6663-423b-a446-9f975a5f5ad4.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Stage 2: Incremental endpoint migration</h2>
<p>With the BFF in the path and validated as transparent, the migration begins. The approach: introduce BFF-handled routes alongside the YARP catch-all, using specificity to determine which requests the BFF handles and which it forwards.</p>
<h3>The routing strategy</h3>
<p>YARP and Minimal API routes coexist in the same application. The critical insight is that ASP.NET Core's routing matches the most specific route first. A BFF-handled route for <code>GET /api/dashboard</code> takes precedence over the YARP catch-all for <code>{**catch-all}</code>. Routes that have not been migrated yet continue to be proxied.</p>
<pre><code class="language-csharp">// Program.cs — Stage 2: BFF routes coexist with catch-all proxy

// Register BFF services
builder.Services.AddHttpClient&lt;ExistingApiClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["ExistingApi:BaseUrl"]!));
builder.Services.AddScoped&lt;DashboardAggregator&gt;();

// ... other BFF registrations ...

var app = builder.Build();

// BFF middleware
app.UseMiddleware&lt;CorrelationIdMiddleware&gt;();
app.UseSerilogRequestLogging();
app.UseAuthentication();
app.UseAuthorization();

// Migrated BFF endpoints — registered BEFORE the catch-all proxy
app.MapDashboardEndpoints();   // Handles GET /api/dashboard
app.MapCourseEndpoints();      // Handles GET /api/courses/{id}

// Catch-all: everything not yet migrated goes to the existing API
app.MapReverseProxy();

app.Run();
</code></pre>
<p>This routing structure means:</p>
<ul>
<li><p><code>GET /api/dashboard</code> → handled by the BFF's <code>DashboardAggregator</code></p>
</li>
<li><p><code>GET /api/courses/c-1</code> → handled by the BFF's <code>CourseAggregator</code></p>
</li>
<li><p><code>GET /api/users/profile</code> → forwarded to the existing API (not yet migrated)</p>
</li>
<li><p><code>POST /api/enrollments</code> → forwarded to the existing API (not yet migrated)</p>
</li>
</ul>
<p>The Vue application makes the same requests to the same BFF base URL. Whether a given request is handled by the BFF or forwarded to the existing API is invisible to the client.</p>
<h3>The migration wrapper: using the existing API as an upstream</h3>
<p>During the migration, the BFF's aggregators call the existing API as their upstream. This is identical to how Article 4's aggregators call domain microservices — the existing API is just another upstream. An <code>ExistingApiClient</code> typed client replaces the individual service clients where the upstream has not yet been decomposed:</p>
<pre><code class="language-csharp">// Clients/ExistingApiClient.cs
public sealed class ExistingApiClient(
    HttpClient http,
    IHttpContextAccessor contextAccessor,
    ILogger&lt;ExistingApiClient&gt; logger)
{
    // Forward the caller's auth token — the existing API validates it
    private void AttachAuth(HttpRequestMessage request)
    {
        var authHeader = contextAccessor.HttpContext?
            .Request.Headers.Authorization.FirstOrDefault();
        if (authHeader is not null)
            request.Headers.TryAddWithoutValidation("Authorization", authHeader);
    }

    public async Task&lt;JsonElement?&gt; GetAsync(
        string path, CancellationToken ct = default)
    {
        var request = new HttpRequestMessage(HttpMethod.Get, path);
        AttachAuth(request);
        try
        {
            var response = await http.SendAsync(request, ct);
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadFromJsonAsync&lt;JsonElement&gt;(ct);
        }
        catch (HttpRequestException ex)
        {
            logger.LogWarning(ex,
                "Existing API unavailable. Path: {Path}. Status: {Status}",
                path, ex.StatusCode);
            return null;
        }
    }
}
</code></pre>
<p>Using <code>JsonElement</code> rather than a typed DTO is intentional for migration work. The existing API's response shapes are already known by the Vue application — the initial goal of the BFF aggregator during migration is not to redesign the shape but to compose multiple calls and introduce the architectural boundary. Shape refinement comes after the boundary is established.</p>
<h3>A migration-phase aggregator</h3>
<p>During migration, the dashboard aggregator composes calls to the existing API, performing the aggregation the Vue application previously performed on the client:</p>
<pre><code class="language-csharp">// Aggregators/DashboardAggregator.cs — migration phase
public sealed class DashboardAggregator(
    ExistingApiClient existingApi,
    ILogger&lt;DashboardAggregator&gt; logger)
{
    public async Task&lt;DashboardResponse&gt; AggregateAsync(
        string userId, CancellationToken ct = default)
    {
        var partialFailures = new List&lt;string&gt;();

        // These were previously four separate fetch calls in the Vue application.
        // The BFF now makes them in parallel and returns a single response.
        var profileTask      = existingApi.GetAsync($"/users/{userId}/profile", ct);
        var notificationTask = existingApi.GetAsync($"/notifications/unread?userId={userId}", ct);

        await Task.WhenAll(profileTask, notificationTask);

        var profile = profileTask.Result;
        if (profile is null)
            throw new BffAggregationException("User profile unavailable.");

        var orgId   = profile.Value.GetProperty("organisationId").GetString();
        var courses  = await existingApi.GetAsync($"/courses?orgId={orgId}", ct);
        if (courses is null) partialFailures.Add("courses");

        JsonElement? sessions = null;
        if (courses.HasValue)
        {
            var courseIds = courses.Value
                .EnumerateArray()
                .Select(c =&gt; c.GetProperty("id").GetString())
                .Where(id =&gt; id is not null)
                .Take(10)
                .ToArray();

            sessions = await existingApi.GetAsync(
                $"/sessions?courseIds={string.Join(",", courseIds)}&amp;limit=3", ct);
            if (sessions is null) partialFailures.Add("sessions");
        }

        // Shape the response — this is where the BFF adds value over the raw proxy
        return new DashboardResponse(
            User: ShapeProfile(profile.Value),
            Courses: ShapeCourses(courses),
            UpcomingSessions: ShapeSessions(sessions),
            Notifications: ShapeNotifications(notificationTask.Result),
            PartialFailures: partialFailures
        );
    }

    private static UserProfileResponse ShapeProfile(JsonElement profile) =&gt; new(
        DisplayName: $"{profile.GetProperty("firstName").GetString()} " +
                     $"{profile.GetProperty("lastName").GetString()}",
        Role: TranslateRole(profile.GetProperty("roleCode").GetString()),
        AvatarUrl: profile.TryGetProperty("avatarPath", out var avatar)
            ? $"/media/avatars/{avatar.GetString()}"
            : null
    );

    // ... additional shape methods ...
}
</code></pre>
<p>The <code>JsonElement</code> approach is verbose but honest about what is happening: the BFF is parsing a response designed for a different consumer and reshaping it. Once the migration is complete and the existing API is replaced by proper upstream services, these <code>JsonElement</code> navigations are replaced by typed DTO deserialisations. Using <code>JsonElement</code> during migration makes the interim state explicit rather than hiding it behind premature typed models that may need to change.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/bdc2720a-3795-4cca-aabd-5d1b73f72eb6.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Managing the coexistence period</h2>
<p>The coexistence period — where some endpoints are handled by the BFF and others are still proxied — is the riskiest phase of the migration. Three practices keep it manageable.</p>
<h3>Track migration state explicitly</h3>
<p>Maintain a migration status document as part of the repository. It should be updated with every pull request that migrates or retires an endpoint:</p>
<pre><code class="language-markdown">&lt;!-- docs/bff-migration-status.md --&gt;
# BFF Migration Status

Last updated: 2025-04-10

## Migrated endpoints (handled by BFF)
| Endpoint                  | Migrated  | Notes                                  |
|---------------------------|-----------|----------------------------------------|
| GET /api/dashboard        | 2025-03-01| Aggregates profile, courses, sessions  |
| GET /api/courses/{id}     | 2025-03-15| Includes enrollment status             |
| GET /api/auth/me          | 2025-03-22| Feide session replaces JWT             |

## Proxied endpoints (still forwarded to existing API)
| Endpoint                  | Owner     | Target migration date                  |
|---------------------------|-----------|----------------------------------------|
| GET /api/users/profile    | Auth team | 2025-04-20                             |
| POST /api/enrollments     | Course team| 2025-04-27 — needs idempotency keys   |
| GET /api/sessions/{id}    | Schedule  | 2025-05-05                             |

## Retired endpoints (no longer in use)
| Endpoint                  | Retired   | Replacement                            |
|---------------------------|-----------|----------------------------------------|
| GET /api/home             | 2025-03-01| GET /api/dashboard                     |
</code></pre>
<p>This document is the migration's source of truth. It prevents the common failure mode where the migration stalls because no one can remember which endpoints have been migrated and which are still proxied.</p>
<h3>Dual-mode testing</h3>
<p>During the coexistence period, integration tests must cover both proxied and BFF-handled paths. A test that only hits the BFF-handled endpoints will miss regressions introduced by YARP configuration changes that affect the proxied endpoints:</p>
<pre><code class="language-csharp">// EducationPlatform.Bff.IntegrationTests/Migration/ProxyBehaviourTests.cs
public class ProxyBehaviourTests(BffWebApplicationFactory factory)
    : IClassFixture&lt;BffWebApplicationFactory&gt;
{
    [Fact]
    public async Task ProxiedEndpoint_ForwardsToExistingApi_WithAuthHeader()
    {
        // Arrange — existing API stub returns a known response
        factory.ExistingApiServer.Given(
            Request.Create()
                .WithPath("/users/profile")
                .WithHeader("Authorization", "Bearer test-token")
                .UsingGet())
            .RespondWith(
                Response.Create()
                    .WithStatusCode(200)
                    .WithBodyAsJson(new { firstName = "Ingrid", lastName = "Solberg" }));

        var client = factory.CreateClient();
        client.DefaultRequestHeaders.Authorization =
            new AuthenticationHeaderValue("Bearer", "test-token");

        // Act — this endpoint is not yet migrated, should proxy
        var response = await client.GetAsync("/users/profile");

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.OK);
        factory.ExistingApiServer.LogEntries.Should().Contain(entry =&gt;
            entry.RequestMessage.Path == "/users/profile");
    }

    [Fact]
    public async Task MigratedEndpoint_HandledByBff_DoesNotCallExistingApi()
    {
        // Arrange
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg", "uninett", "TEACHER", null));
        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);
        factory.CourseClient
            .GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns([]);

        var client = factory.CreateAuthenticatedClient();

        // Act — /api/dashboard is migrated; should NOT hit the existing API
        var response = await client.GetAsync("/api/dashboard");

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.OK);
        factory.ExistingApiServer.LogEntries.Should()
            .NotContain(entry =&gt; entry.RequestMessage.Path == "/api/dashboard");
    }
}
</code></pre>
<p>These tests use WireMock.NET (<code>dotnet add package WireMock.Net</code>) as a stub for the existing API during integration testing. WireMock logs every incoming request, which allows the test to assert that the existing API was or was not called for a given endpoint.</p>
<h3>Feature flags for gradual traffic migration</h3>
<p>For high-risk endpoints — those with complex business logic or high traffic volume — a feature flag allows routing a percentage of traffic to the BFF before full cutover:</p>
<pre><code class="language-csharp">// Middleware/MigrationRoutingMiddleware.cs
public sealed class MigrationRoutingMiddleware(
    RequestDelegate next,
    IConfiguration config)
{
    public async Task InvokeAsync(HttpContext ctx)
    {
        var path = ctx.Request.Path.Value ?? "";

        // Check if this path has a gradual migration percentage configured
        var migrationKey = $"Migration:Routes:{SanitisePath(path)}:BffPercentage";
        if (int.TryParse(config[migrationKey], out var bffPercent)
            &amp;&amp; bffPercent &lt; 100)
        {
            var roll = Random.Shared.Next(100);
            if (roll &gt;= bffPercent)
            {
                // This request goes to the existing API
                // Mark it so YARP handles it instead of the BFF route
                ctx.Items["ForceProxy"] = true;
            }
        }

        await next(ctx);
    }

    private static string SanitisePath(string path) =&gt;
        path.TrimStart('/').Replace('/', ':');
}
</code></pre>
<p>Configure the migration percentage in <code>appsettings.json</code>:</p>
<pre><code class="language-json">{
  "Migration": {
    "Routes": {
      "api:courses:detail": {
        "BffPercentage": 10
      }
    }
  }
}
</code></pre>
<p>Start at 10% of traffic to the BFF for a newly migrated endpoint. Monitor error rates and latency in Application Insights, split by the <code>Source: bff</code> vs <code>Source: existing-api</code> label set in the middleware. Increase the percentage incrementally — 10%, 25%, 50%, 100% — over days or weeks depending on confidence. If errors appear at any stage, rollback is a configuration change, not a code deployment.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/3ce3f917-3a9e-45ef-9c78-6a628be1b01b.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Stage 3: Authentication boundary migration</h2>
<p>Authentication is typically the most complex aspect of brownfield BFF migration. The existing system usually uses a pattern established years ago — often bearer tokens stored in <code>localStorage</code>, JWT validation in the frontend, and a stateless API. The BFF introduces a fundamentally different model: server-side sessions, <code>HttpOnly</code> cookies, and Feide OIDC.</p>
<p>These two authentication models cannot coexist in the same session. A user authenticated with the old model cannot seamlessly transition to the new model without a re-authentication event. The migration strategy must account for this.</p>
<h3>The dual-auth window</h3>
<p>During the migration, the BFF accepts both authentication models simultaneously:</p>
<pre><code class="language-csharp">// Authentication configuration during migration
builder.Services
    .AddAuthentication(options =&gt;
    {
        // Default to cookie (new model) — falls back to JWT (old model)
        options.DefaultAuthenticateScheme = "cookie-or-jwt";
        options.DefaultChallengeScheme    = OpenIdConnectDefaults.AuthenticationScheme;
    })
    .AddCookie(CookieAuthenticationDefaults.AuthenticationScheme, opts =&gt;
    {
        opts.Cookie.Name     = "__bff_session";
        opts.Cookie.HttpOnly = true;
        opts.Cookie.Secure   = true;
        opts.Cookie.SameSite = SameSiteMode.Strict;
        opts.Cookie.MaxAge   = TimeSpan.FromHours(8);
    })
    .AddJwtBearer("legacy-jwt", opts =&gt;
    {
        // Existing API's JWT configuration — same issuer, same audience
        opts.Authority = config["LegacyAuth:Authority"];
        opts.Audience  = config["LegacyAuth:Audience"];
        opts.TokenValidationParameters = new TokenValidationParameters
        {
            ValidateIssuer   = true,
            ValidateAudience = true,
            ValidateLifetime = true
        };
    })
    .AddOpenIdConnect(OpenIdConnectDefaults.AuthenticationScheme, opts =&gt;
    {
        // Feide configuration from Article 6
        opts.Authority    = config["Feide:Authority"];
        opts.ClientId     = config["Feide:ClientId"];
        opts.ClientSecret = config["Feide:ClientSecret"];
        // ...
    })
    .AddPolicyScheme("cookie-or-jwt", "Cookie or JWT", opts =&gt;
    {
        opts.ForwardDefaultSelector = ctx =&gt;
        {
            // If a session cookie is present — use cookie auth (new model)
            if (ctx.Request.Cookies.ContainsKey("__bff_session"))
                return CookieAuthenticationDefaults.AuthenticationScheme;

            // If an Authorization header is present — use JWT (legacy model)
            if (ctx.Request.Headers.ContainsKey("Authorization"))
                return "legacy-jwt";

            // Neither — challenge with Feide
            return OpenIdConnectDefaults.AuthenticationScheme;
        };
    });
</code></pre>
<p>With this configuration, a user with an existing JWT session continues to function. A user who re-authenticates through the BFF gets a session cookie. Both are valid during the migration window.</p>
<h3>Forcing re-authentication</h3>
<p>The migration window should have a defined end date. After that date, the <code>legacy-jwt</code> scheme is removed and only the cookie-based session is accepted. Users still holding a JWT are redirected to the Feide login page on their next request. This is a deliberate, communicated breaking change — not a silent failure.</p>
<p>Communicate it to users as a security improvement ("we've updated how authentication works to improve security") and choose a date that coincides with the natural expiry of existing JWTs. If JWTs expire after 24 hours, removing the legacy scheme after 25 hours ensures no currently-valid JWT is invalidated — users whose session expires naturally will authenticate through the new model on their next login.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/d48f0765-37a9-413a-bc5f-14a6c894c672.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Stage 4: Retiring the existing API connection</h2>
<p>The migration is complete when every endpoint in the migration status document has been moved from "proxied" to "migrated." At this point, the YARP catch-all route is dead code — no requests reach it.</p>
<p>Verify this with a query in Application Insights before removing it:</p>
<pre><code class="language-plaintext">// Check for any requests hitting the proxy catch-all in the last 7 days
requests
| where timestamp &gt; ago(7d)
| where cloud_RoleName == "bff"
| where name contains "catch-all"    // YARP names proxied routes by their route id
| summarize count() by name, bin(timestamp, 1d)
</code></pre>
<p>If this query returns results, there are endpoints that were missed in the migration. Find them, migrate them, and run the query again.</p>
<p>Once the query returns no results for 7 days, the YARP catch-all can be removed:</p>
<pre><code class="language-csharp">// Program.cs — Stage 4: proxy removed, BFF-only
// app.MapReverseProxy(); ← removed

app.MapDashboardEndpoints();
app.MapCourseEndpoints();
app.MapAuthEndpoints();
app.MapSessionEndpoints();
app.MapEnrollmentEndpoints();

app.Run();
</code></pre>
<p>And the <code>Yarp.ReverseProxy</code> package can be removed from the project:</p>
<pre><code class="language-shell">dotnet remove package Yarp.ReverseProxy
</code></pre>
<p>The removal of the proxy dependency is the signal that the Strangler Fig has completed its work. The old structure has been replaced; the BFF stands on its own.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/d1eb3c76-97b2-4d19-8163-2eb4ae5bed8a.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>The migration timeline in practice</h2>
<p>The education platform BFF migration was completed over nine weeks. The timeline was driven by three factors: the number of distinct endpoint groups that needed migrating, the complexity of the authentication transition, and the team's capacity alongside their ongoing feature work.</p>
<pre><code class="language-plaintext">Week 1–2:  Stage 1 — proxy deployed, validated, traffic baseline established
Week 3–4:  Stage 2 — dashboard and course list endpoints migrated
Week 5:    Stage 2 — authentication boundary migrated, dual-auth window opened
Week 6–7:  Stage 2 — remaining read endpoints migrated
Week 8:    Stage 2 — write endpoints migrated with idempotency keys
Week 9:    Stage 3 — legacy JWT support removed, dual-auth window closed
           Stage 4 — proxy removed, YARP dependency removed
</code></pre>
<p>The migration was done entirely without a feature freeze. Feature development continued on the existing API endpoints while the BFF migration progressed. The coexistence routing made this possible — new features added to the existing API continued to work through the proxy until those endpoints were migrated.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/f7831db1-fc36-4764-b44f-1a3bd2a121de.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What goes wrong in brownfield migrations</h2>
<p>Three failure modes appear consistently in BFF migrations that stall or regress.</p>
<p><strong>The migration loses momentum after the easy endpoints.</strong> The first three or four endpoints migrated are typically the most straightforward — read endpoints with simple response shapes that the Vue application already knows how to consume. Then the migration reaches the endpoints with complex business logic, non-idempotent writes, or dependencies on authentication state that differs between the old and new model. These endpoints take longer. The team's capacity is consumed by feature work. The migration status document stops being updated. Six months later, 60% of endpoints are migrated and no one has the context to finish.</p>
<p>The fix is a defined migration end date, agreed at the start of the project, with engineering management visibility. "We will complete the migration by [date]" is a commitment that changes how the team prioritises the remaining work. An open-ended migration is a migration that will not be completed.</p>
<p><strong>The proxy catches bugs that the existing API already had.</strong> When the BFF is introduced as a proxy, errors that were already present in the existing API become visible through the BFF's logs and Application Insights. The instinct is to fix them in the BFF. Resist this — fixing existing API bugs in the BFF creates coupling between the proxy and the API's broken behaviour, and the fixes must be re-done when the endpoint is eventually migrated to native BFF handling. Log the bugs, report them to the teams that own the existing API, and let them be fixed at the source.</p>
<p><strong>The Vue application is modified during the migration.</strong> If the Vue application is updated to use the new BFF response shapes before all endpoints are migrated, the frontend code must handle two different shapes for the same data simultaneously — one from the BFF, one from the legacy proxy. This is the most common source of migration-phase bugs. The correct sequencing: migrate the BFF endpoint first, validate it in staging, then update the Vue application's composables to use the new shape. Never the reverse.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/807da55c-8073-4a08-987f-32af8fbae0ed.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>When the Strangler Fig is not the right approach</h2>
<p>The Strangler Fig is the right migration strategy for most brownfield BFF adoptions, but not all.</p>
<p>If the existing API and the new BFF will share the same domain name and the same path namespace, the routing differentiation that YARP provides requires careful configuration to avoid conflicts. In this situation, it may be simpler to accept a brief maintenance window and do a cutover rather than managing the coexistence routing.</p>
<p>If the Vue application is being rewritten simultaneously — a common scenario in brownfield projects where technical debt has accumulated enough to justify both a frontend rewrite and a BFF introduction — the Strangler Fig adds complexity without benefit. A clean-slate Vue 3 application and a new BFF can be developed together against a shared API contract, with the existing frontend kept running until the new system is validated. This is a parallel-run strategy rather than a Strangler Fig, and it is the correct choice when the frontend is being replaced rather than migrated.</p>
<p>The Strangler Fig is for preserving the existing frontend while migrating the backend layer beneath it. When both are changing at once, a different approach is warranted.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/47516830-6a09-4b15-9c09-6c15fca48597.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li>→ <a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[BFF Resilience Patterns: Circuit Breakers, Retries & Timeouts with Polly]]></title><description><![CDATA[A BFF that aggregates four upstream services inherits four independent failure modes. Any one of them can be unavailable, slow, or intermittently returning errors at any time. The question is not whet]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly</guid><category><![CDATA[bff]]></category><category><![CDATA[polly]]></category><category><![CDATA[dotnet]]></category><category><![CDATA[Resilience]]></category><category><![CDATA[fault tolerance]]></category><category><![CDATA[circuit breaker]]></category><category><![CDATA[retries]]></category><category><![CDATA[timeout]]></category><category><![CDATA[Bulkhead]]></category><category><![CDATA[httpclient]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 19 Apr 2026 02:54:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/545ce4aa-9811-4bb6-b380-87cdf8a2a506.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>A BFF that aggregates four upstream services inherits four independent failure modes. Any one of them can be unavailable, slow, or intermittently returning errors at any time. The question is not whether an upstream service will fail — it will — but whether that failure propagates to the user as a broken screen or is absorbed by the BFF and handled gracefully.</p>
<p>Polly is the .NET resilience library that provides the building blocks to absorb those failures: retries for transient errors, timeouts for slow upstreams, circuit breakers for services that are systematically down, and bulkheads for isolating one upstream's failure from another's. Used correctly, these patterns make the BFF fault-tolerant. Used incorrectly — retrying too aggressively, timing out too generously, failing to isolate failure domains — they amplify the problems they were meant to solve.</p>
<p>This article covers the correct application of each pattern to a BFF, the specific failure modes each one addresses, and how they compose into a production-grade resilience strategy. Code examples use the education platform BFF built throughout this series and <code>Microsoft.Extensions.Http.Resilience</code>, the .NET 8 integration layer that wires Polly into the <code>HttpClient</code> pipeline.</p>
<hr />
<h2>The resilience problem, stated precisely</h2>
<p>In Article 4, every typed HTTP client was configured with <code>AddStandardResilienceHandler()</code>:</p>
<pre><code class="language-csharp">builder.Services.AddHttpClient&lt;CourseServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:CourseService:BaseUrl"]!))
    .AddStandardResilienceHandler();
</code></pre>
<p>The standard handler is a reasonable starting point — it wires retry, circuit breaker, and timeout with sensible defaults. But it is a generic solution, and a BFF has specific requirements that the defaults do not address:</p>
<ul>
<li><p>Different upstream services have different acceptable latency budgets. A user profile lookup should time out faster than a course session export.</p>
</li>
<li><p>Retrying a user profile call three times is reasonable. Retrying an enrollment mutation three times could create three enrollments.</p>
</li>
<li><p>A circuit breaker that opens for 30 seconds on the notification service should not affect the circuit breaker state of the course service.</p>
</li>
<li><p>The aggregator's partial failure handling (from Article 4) depends on the resilience layer returning a specific failure signal — not throwing an exception that crashes the entire aggregation.</p>
</li>
</ul>
<p>These requirements mean the standard handler needs to be replaced with per-client custom configuration for any BFF that runs in production under real conditions.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/cd780fa8-1b98-429d-97db-b654f3ed12c6.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Installing the right packages</h2>
<pre><code class="language-shell">dotnet add package Microsoft.Extensions.Http.Resilience
dotnet add package Polly
dotnet add package Polly.Extensions
</code></pre>
<p><code>Microsoft.Extensions.Http.Resilience</code> is the preferred integration layer in .NET 8. It uses Polly 8 under the hood and integrates with <code>IHttpClientFactory</code>, <code>ILogger</code>, and <code>IMetricsFactory</code> from the host. The raw <code>Polly</code> package is used for building custom strategies; <code>Polly.Extensions</code> provides the <code>ResiliencePipelineBuilder</code> extensions.</p>
<hr />
<h2>Understanding the strategy execution order</h2>
<p>Before configuring individual strategies, the order in which they wrap each request matters significantly. The standard pipeline executes strategies from outermost to innermost:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/52c70df9-9608-4f08-bff8-189abc6479ec.png" alt="" style="display:block;margin:0 auto" />

<p>The total timeout is the hard wall — no matter how many retries are attempted, the entire operation cannot exceed this duration. The retry wraps the circuit breaker, which means the circuit breaker sees individual attempt outcomes. The attempt timeout is per-attempt — if a single upstream call takes longer than the attempt timeout, it is cancelled and the retry policy fires.</p>
<p>This ordering is not arbitrary. Inverting the circuit breaker and the retry would mean the circuit breaker sees aggregate retry counts as single outcomes, which defeats its purpose. Understanding this ordering is prerequisite to understanding why the custom configuration below is shaped the way it is.</p>
<hr />
<h2>Per-client resilience configuration</h2>
<p>The correct approach for a BFF is to define a resilience pipeline per upstream client, tuned to that service's characteristics. A helper method keeps the configuration readable:</p>
<pre><code class="language-csharp">// Infrastructure/Resilience/ResiliencePipelineFactory.cs
public static class ResiliencePipelineFactory
{
    /// &lt;summary&gt;
    /// Standard read pipeline — safe to retry, moderate timeout.
    /// Use for GET requests to stable internal services.
    /// &lt;/summary&gt;
    public static Action&lt;ResiliencePipelineBuilder&lt;HttpResponseMessage&gt;&gt;
        ReadPipeline(
            string serviceName,
            TimeSpan attemptTimeout,
            TimeSpan totalTimeout) =&gt; pipeline =&gt;
    {
        pipeline
            // 1. Total timeout — hard limit on the whole operation including retries
            .AddTimeout(new TimeoutStrategyOptions
            {
                Timeout = totalTimeout,
                OnTimeout = args =&gt;
                {
                    Log.TotalTimeout(args.Context.GetLogger(), serviceName, totalTimeout);
                    return ValueTask.CompletedTask;
                }
            })

            // 2. Retry — exponential backoff with jitter, read-safe
            .AddRetry(new RetryStrategyOptions&lt;HttpResponseMessage&gt;
            {
                MaxRetryAttempts = 2,
                Delay             = TimeSpan.FromMilliseconds(200),
                BackoffType       = DelayBackoffType.Exponential,
                UseJitter         = true,
                ShouldHandle      = args =&gt; ValueTask.FromResult(
                    ShouldRetry(args.Outcome)),
                OnRetry = args =&gt;
                {
                    Log.Retrying(args.Context.GetLogger(), serviceName,
                        args.AttemptNumber + 1, args.RetryDelay);
                    return ValueTask.CompletedTask;
                }
            })

            // 3. Circuit breaker — opens after sustained failures
            .AddCircuitBreaker(new CircuitBreakerStrategyOptions&lt;HttpResponseMessage&gt;
            {
                FailureRatio            = 0.5,   // Open when 50% of requests fail
                SamplingDuration        = TimeSpan.FromSeconds(30),
                MinimumThroughput       = 5,     // Minimum requests before ratio applies
                BreakDuration           = TimeSpan.FromSeconds(20),
                ShouldHandle            = args =&gt; ValueTask.FromResult(
                    ShouldHandle(args.Outcome)),
                OnOpened = args =&gt;
                {
                    Log.CircuitOpened(args.Context.GetLogger(), serviceName,
                        args.BreakDuration);
                    return ValueTask.CompletedTask;
                },
                OnClosed = args =&gt;
                {
                    Log.CircuitClosed(args.Context.GetLogger(), serviceName);
                    return ValueTask.CompletedTask;
                },
                OnHalfOpened = args =&gt;
                {
                    Log.CircuitHalfOpened(args.Context.GetLogger(), serviceName);
                    return ValueTask.CompletedTask;
                }
            })

            // 4. Per-attempt timeout — cancels a single slow call before retry fires
            .AddTimeout(new TimeoutStrategyOptions
            {
                Timeout = attemptTimeout
            });
    };

    /// &lt;summary&gt;
    /// Write pipeline — NOT safe to retry on most failures.
    /// Use for POST/PUT/DELETE requests where idempotency cannot be guaranteed.
    /// &lt;/summary&gt;
    public static Action&lt;ResiliencePipelineBuilder&lt;HttpResponseMessage&gt;&gt;
        WritePipeline(string serviceName, TimeSpan attemptTimeout) =&gt; pipeline =&gt;
    {
        pipeline
            // Total timeout only — no retry for non-idempotent operations
            .AddTimeout(new TimeoutStrategyOptions { Timeout = attemptTimeout })

            // Circuit breaker — still needed to fail-fast when service is down
            .AddCircuitBreaker(new CircuitBreakerStrategyOptions&lt;HttpResponseMessage&gt;
            {
                FailureRatio      = 0.5,
                SamplingDuration  = TimeSpan.FromSeconds(30),
                MinimumThroughput = 3,
                BreakDuration     = TimeSpan.FromSeconds(20),
                ShouldHandle      = args =&gt; ValueTask.FromResult(
                    ShouldHandle(args.Outcome))
            });
    };

    // Which outcomes warrant a retry
    private static bool ShouldRetry(Outcome&lt;HttpResponseMessage&gt; outcome)
    {
        if (outcome.Exception is HttpRequestException or TaskCanceledException)
            return true;

        if (outcome.Result is { } response)
            return response.StatusCode is
                HttpStatusCode.RequestTimeout or      // 408
                HttpStatusCode.TooManyRequests or     // 429
                HttpStatusCode.InternalServerError or // 500
                HttpStatusCode.BadGateway or          // 502
                HttpStatusCode.ServiceUnavailable or  // 503
                HttpStatusCode.GatewayTimeout;        // 504

        return false;
    }

    // Which outcomes the circuit breaker counts as failures
    private static bool ShouldHandle(Outcome&lt;HttpResponseMessage&gt; outcome)
    {
        if (outcome.Exception is not null) return true;
        if (outcome.Result is { } response)
            return (int)response.StatusCode &gt;= 500;
        return false;
    }

    // Structured log messages — static for performance
    private static class Log
    {
        public static void TotalTimeout(ILogger? logger, string service, TimeSpan timeout) =&gt;
            logger?.LogWarning(
                "Total timeout exceeded for {Service}. Timeout: {Timeout}ms",
                service, timeout.TotalMilliseconds);

        public static void Retrying(ILogger? logger, string service,
            int attempt, TimeSpan delay) =&gt;
            logger?.LogWarning(
                "Retrying {Service}. Attempt: {Attempt}, Delay: {DelayMs}ms",
                service, attempt, delay.TotalMilliseconds);

        public static void CircuitOpened(ILogger? logger, string service,
            TimeSpan breakDuration) =&gt;
            logger?.LogError(
                "Circuit breaker OPENED for {Service}. " +
                "Break duration: {BreakDuration}s. Upstream calls suspended.",
                service, breakDuration.TotalSeconds);

        public static void CircuitClosed(ILogger? logger, string service) =&gt;
            logger?.LogInformation(
                "Circuit breaker CLOSED for {Service}. Upstream calls resumed.", service);

        public static void CircuitHalfOpened(ILogger? logger, string service) =&gt;
            logger?.LogInformation(
                "Circuit breaker HALF-OPEN for {Service}. Probing upstream.", service);
    }
}
</code></pre>
<hr />
<h2>Wiring per-client pipelines in Program.cs</h2>
<p>Each upstream client receives a pipeline tuned to its characteristics. The latency budget for each service was derived from the p95 response times observed in Application Insights during the first month of production operation:</p>
<pre><code class="language-csharp">// Program.cs
// User Service — small, fast lookups; tight timeout; retryable
builder.Services
    .AddHttpClient&lt;UserServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:UserService:BaseUrl"]!))
    .AddResilienceHandler("user-service",
        ResiliencePipelineFactory.ReadPipeline(
            serviceName:    "UserService",
            attemptTimeout: TimeSpan.FromMilliseconds(400),
            totalTimeout:   TimeSpan.FromMilliseconds(1200)))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();

// Course Service — dataset can be larger; slightly longer timeout
builder.Services
    .AddHttpClient&lt;CourseServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:CourseService:BaseUrl"]!))
    .AddResilienceHandler("course-service",
        ResiliencePipelineFactory.ReadPipeline(
            serviceName:    "CourseService",
            attemptTimeout: TimeSpan.FromMilliseconds(600),
            totalTimeout:   TimeSpan.FromMilliseconds(2000)))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();

// Session Service — potentially heavier queries; more generous total timeout
builder.Services
    .AddHttpClient&lt;SessionServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:SessionService:BaseUrl"]!))
    .AddResilienceHandler("session-service",
        ResiliencePipelineFactory.ReadPipeline(
            serviceName:    "SessionService",
            attemptTimeout: TimeSpan.FromMilliseconds(800),
            totalTimeout:   TimeSpan.FromMilliseconds(2500)))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();

// Notification Service — low criticality; tight budget
builder.Services
    .AddHttpClient&lt;NotificationServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:NotificationService:BaseUrl"]!))
    .AddResilienceHandler("notification-service",
        ResiliencePipelineFactory.ReadPipeline(
            serviceName:    "NotificationService",
            attemptTimeout: TimeSpan.FromMilliseconds(300),
            totalTimeout:   TimeSpan.FromMilliseconds(900)))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();

// Enrollment — write operation; no retry; circuit breaker only
builder.Services
    .AddHttpClient&lt;EnrollmentServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:CourseService:BaseUrl"]!))
    .AddResilienceHandler("enrollment-write",
        ResiliencePipelineFactory.WritePipeline(
            serviceName:    "EnrollmentService",
            attemptTimeout: TimeSpan.FromSeconds(5)))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();
</code></pre>
<p>The notification service has the tightest budget — 300ms per attempt, 900ms total — because it is the least critical upstream in the aggregation. If notifications are slow, the partial failure path (from Article 4) handles the absence gracefully. Spending 2.5 seconds waiting for a notification count is a worse user experience than returning a count of zero with a <code>partialFailures: ["notifications"]</code> marker.</p>
<hr />
<h2>Handling resilience exceptions in typed clients</h2>
<p>The resilience pipeline throws specific exceptions when it exhausts its strategies. The typed clients must catch these and return <code>null</code> so the aggregator's partial failure logic can handle them:</p>
<pre><code class="language-csharp">// Clients/CourseServiceClient.cs
public sealed class CourseServiceClient(
    HttpClient http,
    IHttpContextAccessor contextAccessor,
    ILogger&lt;CourseServiceClient&gt; logger)
{
    public async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; GetCoursesByOrgAsync(
        string orgId, CancellationToken ct = default)
    {
        try
        {
            var request = new HttpRequestMessage(
                HttpMethod.Get, $"courses?orgId={orgId}");
            AttachCorrelationId(request);

            var response = await http.SendAsync(request, ct);
            response.EnsureSuccessStatusCode();
            return await response.Content
                .ReadFromJsonAsync&lt;IReadOnlyList&lt;CourseDto&gt;&gt;(ct);
        }
        catch (BrokenCircuitException ex)
        {
            // Circuit is open — upstream is known-bad, skip immediately
            logger.LogWarning(
                "Circuit open for CourseService. Skipping upstream call. " +
                "OrgId: {OrgId}. Message: {Message}", orgId, ex.Message);
            return null;
        }
        catch (TimeoutRejectedException ex)
        {
            // Total timeout exhausted — upstream is too slow
            logger.LogWarning(
                "Timeout exhausted for CourseService. OrgId: {OrgId}. " +
                "Duration: {Duration}ms", orgId, ex.Telemetry.ExecutionTime.TotalMilliseconds);
            return null;
        }
        catch (HttpRequestException ex)
        {
            logger.LogWarning(ex,
                "HTTP error from CourseService. OrgId: {OrgId}. Status: {Status}",
                orgId, ex.StatusCode);
            return null;
        }
        catch (OperationCanceledException) when (!ct.IsCancellationRequested)
        {
            // Cancelled by the per-attempt timeout, not by the caller
            logger.LogWarning(
                "CourseService call cancelled by attempt timeout. OrgId: {OrgId}", orgId);
            return null;
        }
    }

    private void AttachCorrelationId(HttpRequestMessage request)
    {
        var correlationId = contextAccessor.HttpContext?
            .Response.Headers["X-Correlation-Id"].FirstOrDefault();
        if (correlationId is not null)
            request.Headers.TryAddWithoutValidation("X-Correlation-Id", correlationId);
    }
}
</code></pre>
<p><code>BrokenCircuitException</code> is the most important case. When the circuit is open, Polly throws this exception immediately — no upstream call is made. The client catches it and returns <code>null</code>, which the aggregator records as a partial failure. A screen that would have waited 2.5 seconds for a timing-out upstream now fails in microseconds. This is the circuit breaker's primary value: fail fast rather than fail slow.</p>
<hr />
<h2>The aggregator: partial failure as a first-class outcome</h2>
<p>The aggregator receives <code>null</code> from clients whose upstream calls failed, regardless of which resilience strategy triggered the failure. The distinction between a <code>BrokenCircuitException</code> and a <code>TimeoutRejectedException</code> is logged at the client level; the aggregator only sees the null result and decides what to do with it.</p>
<pre><code class="language-csharp">// Aggregators/DashboardAggregator.cs
public async Task&lt;DashboardResponse&gt; AggregateAsync(
    string userId, CancellationToken ct = default)
{
    var partialFailures = new List&lt;string&gt;();

    // Phase 1: parallel — both can fail independently
    var profileTask      = _userClient.GetProfileAsync(userId, ct);
    var notificationTask = _notificationClient.GetUnreadCountAsync(userId, ct);
    await Task.WhenAll(profileTask, notificationTask);

    var profile = profileTask.Result;

    // Profile is required — its absence is not a partial failure, it is a hard stop
    if (profile is null)
        throw new BffAggregationException("User profile service unavailable.");

    // Notification is optional — absence is gracefully degraded
    var notificationCount = notificationTask.Result ?? 0;
    if (notificationTask.Result is null)
        partialFailures.Add("notifications");

    // Phase 2: courses — absence degrades but does not fail the response
    var courses = await _courseClient.GetCoursesByOrgAsync(profile.OrgId, ct);
    if (courses is null)
        partialFailures.Add("courses");

    // Phase 3: sessions — only attempted if courses succeeded
    IReadOnlyList&lt;SessionDto&gt;? sessions = null;
    if (courses is { Count: &gt; 0 })
    {
        sessions = await _sessionClient.GetUpcomingAsync(
            courses.Select(c =&gt; c.Id).ToArray(), 3, ct);
        if (sessions is null)
            partialFailures.Add("sessions");
    }

    return new DashboardResponse(
        User:             ShapeUserProfile(profile),
        Courses:          courses?.Select(ShapeCourse).ToList() ?? [],
        UpcomingSessions: sessions?.Select(ShapeSession).ToList() ?? [],
        Notifications:    new NotificationSummary(notificationCount),
        PartialFailures:  partialFailures
    );
}
</code></pre>
<p>The aggregator does not know whether <code>courses</code> is <code>null</code> because the circuit breaker opened, because a retry was exhausted, or because the service returned a 500. That distinction belongs in the client's log entry, which carries the correlation ID. The aggregator concerns itself only with the outcome: data was available or it was not.</p>
<hr />
<h2>The retry problem: when not to retry</h2>
<p>The <code>ShouldRetry</code> predicate above deliberately excludes certain status codes. This is the most consequential decision in retry configuration.</p>
<p><strong>Do not retry 4xx errors (except 408 and 429).</strong> A 400 Bad Request means the request itself is malformed — retrying the same request will produce the same 400. A 401 or 403 means the caller is not authorised — retrying will not change that. A 404 means the resource does not exist — retrying will not create it. The only 4xx codes worth retrying are 408 (request timeout, which may have been a transient infrastructure issue) and 429 (too many requests, which should be retried after the <code>Retry-After</code> header's delay).</p>
<p><strong>Do not retry non-idempotent operations.</strong> The write pipeline above has no retry. A POST to <code>/courses/{id}/enrollment</code> that creates an enrollment and then returns a 500 due to a response serialisation error has still created the enrollment. Retrying creates a duplicate. The service must be designed to be idempotent — or the client must not retry.</p>
<p>In the production system, this distinction caused one production incident before the write pipeline was separated. A retry on a 500 from the enrollment service — which had successfully created the enrollment before encountering a downstream notification error — created duplicate enrollments for four students. The fix was the dedicated write pipeline with no retry and explicit idempotency keys added to the enrollment POST.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/f29711af-9855-4ca6-9044-bedb7384e4bc.png" alt="" style="display:block;margin:0 auto" />

<h3>Idempotency keys for safe write retries</h3>
<p>If retrying writes is genuinely required, idempotency keys are the mechanism. The BFF generates a key for the operation, sends it with the request, and the upstream service uses it to deduplicate:</p>
<pre><code class="language-csharp">// Clients/EnrollmentServiceClient.cs
public async Task&lt;EnrollmentResultDto?&gt; EnrollAsync(
    string courseId, string userId,
    string idempotencyKey, // Caller-provided — generated once per user action
    CancellationToken ct = default)
{
    var request = new HttpRequestMessage(
        HttpMethod.Post, $"courses/{courseId}/enrollments");
    request.Headers.TryAddWithoutValidation("Idempotency-Key", idempotencyKey);
    request.Content = JsonContent.Create(new { UserId = userId });

    // With idempotency key, retry is safe — upstream will deduplicate
    var response = await http.SendAsync(request, ct);
    response.EnsureSuccessStatusCode();
    return await response.Content.ReadFromJsonAsync&lt;EnrollmentResultDto&gt;(ct);
}
</code></pre>
<p>The idempotency key is generated in the BFF endpoint from the user ID and the course ID, making it stable for the same logical operation regardless of how many times it is submitted:</p>
<pre><code class="language-csharp">// Endpoints/EnrollmentEndpoints.cs
private static async Task&lt;IResult&gt; EnrollAsync(
    string courseId, HttpContext ctx,
    EnrollmentServiceClient enrollmentClient,
    EnrollmentCache enrollmentCache,
    CancellationToken ct)
{
    var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier)!;

    // Deterministic key — same user + course always produces the same key
    var idempotencyKey = Convert.ToHexString(
        SHA256.HashData(
            Encoding.UTF8.GetBytes($"{userId}:{courseId}:{DateTime.UtcNow:yyyy-MM-dd}")));

    var result = await enrollmentClient.EnrollAsync(courseId, userId, idempotencyKey, ct);

    if (result is null)
        return Results.Problem(
            detail: "Enrollment could not be processed.",
            statusCode: StatusCodes.Status502BadGateway);

    await enrollmentCache.InvalidateAsync(userId, ct);
    return Results.Ok(result);
}
</code></pre>
<p>The date component in the key ensures that the same enrollment attempt on different days produces different keys — which is correct, since a student might legitimately withdraw and re-enroll in the same course across days.</p>
<hr />
<h2>Bulkhead isolation: containing failure domains</h2>
<p>A bulkhead limits the number of concurrent calls to a specific upstream service. Without bulkheads, a slow upstream service can exhaust the BFF's thread pool — every available thread is waiting on that upstream, and requests to other upstreams queue behind them.</p>
<p>Bulkhead support in <code>Microsoft.Extensions.Http.Resilience</code> is provided through the <code>AddConcurrencyLimiter</code> extension:</p>
<pre><code class="language-csharp">// For services that are particularly prone to slowdowns under load
builder.Services
    .AddHttpClient&lt;SessionServiceClient&gt;(client =&gt;
        client.BaseAddress = new Uri(config["Services:SessionService:BaseUrl"]!))
    .AddResilienceHandler("session-service", pipeline =&gt;
    {
        // Add bulkhead before the read pipeline strategies
        pipeline.AddConcurrencyLimiter(new ConcurrencyLimiterOptions
        {
            PermitLimit = 20,  // Max 20 concurrent calls to SessionService
            QueueLimit  = 5    // Queue up to 5 more — reject beyond that
        });

        // Then the standard read pipeline strategies
        ResiliencePipelineFactory.ReadPipeline(
            "SessionService",
            TimeSpan.FromMilliseconds(800),
            TimeSpan.FromMilliseconds(2500))(pipeline);
    })
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;();
</code></pre>
<p>When the session service is slow and 20 concurrent BFF requests are already waiting for it, the 21st through 25th requests queue. The 26th is rejected immediately with a <code>RateLimitRejectedException</code>, which the client catches and returns as <code>null</code> — a partial failure for sessions, not a hard error. The user profile and course data still load; only upcoming sessions are absent.</p>
<p>Without the bulkhead, the 26th request would add another waiting thread. At sufficient load, the BFF's thread pool is exhausted by session service calls, and requests to the user service — which might be perfectly healthy — cannot execute. The bulkhead contains the blast radius.</p>
<hr />
<h2>Testing resilience behaviour</h2>
<p>Resilience strategies are only trustworthy if they are tested. The integration test factory from Article 8 provides the mechanism — configure the substitute to throw the exceptions that the resilience layer would throw, and verify the aggregator's response.</p>
<pre><code class="language-csharp">// EducationPlatform.Bff.IntegrationTests/Resilience/CircuitBreakerTests.cs
public class CircuitBreakerTests(BffWebApplicationFactory factory)
    : IClassFixture&lt;BffWebApplicationFactory&gt;
{
    [Fact]
    public async Task Dashboard_CourseServiceCircuitOpen_ReturnsDegradedResponse()
    {
        // Arrange — profile and notifications available; courses circuit open
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg", "uninett", "TEACHER", null));

        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);

        // Simulate the client returning null (as it would after catching BrokenCircuitException)
        factory.CourseClient
            .GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((IReadOnlyList&lt;CourseDto&gt;?)null);

        var client = factory.CreateAuthenticatedClient();

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert — 200 with partial failure, not a 503
        response.StatusCode.Should().Be(HttpStatusCode.OK);
        var body = await response.Content.ReadFromJsonAsync&lt;DashboardResponse&gt;();
        body!.Courses.Should().BeEmpty();
        body.PartialFailures.Should().Contain("courses");
        body.User.DisplayName.Should().Be("Ingrid Solberg"); // User data unaffected
    }

    [Fact]
    public async Task Dashboard_AllNonCriticalServicesUnavailable_ReturnsMinimalResponse()
    {
        // Arrange — only profile available
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg", "uninett", "TEACHER", null));

        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((int?)null);

        factory.CourseClient
            .GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((IReadOnlyList&lt;CourseDto&gt;?)null);

        var client = factory.CreateAuthenticatedClient();

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert — still a valid response, just minimally populated
        response.StatusCode.Should().Be(HttpStatusCode.OK);
        var body = await response.Content.ReadFromJsonAsync&lt;DashboardResponse&gt;();
        body!.User.Should().NotBeNull(); // The one thing that always works
        body.Courses.Should().BeEmpty();
        body.UpcomingSessions.Should().BeEmpty();
        body.Notifications.Count.Should().Be(0);
        body.PartialFailures.Should().HaveCount(2)
            .And.Contain("courses")
            .And.Contain("notifications");
    }

    [Fact]
    public async Task Dashboard_ProfileServiceUnavailable_Returns503()
    {
        // Profile is required — its absence cannot be partially failed
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((UserProfileDto?)null);

        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);

        var client = factory.CreateAuthenticatedClient();

        var response = await client.GetAsync("/api/dashboard");

        response.StatusCode.Should().Be(HttpStatusCode.ServiceUnavailable);
    }
}
</code></pre>
<p>These tests verify the aggregator's partial failure behaviour under resilience-layer outcomes. They test the outcomes — what the Vue application receives — not the Polly strategies themselves. Testing Polly's internal behaviour is Polly's job; testing that your aggregator responds correctly to the signals Polly produces is yours.</p>
<hr />
<h2>Observing resilience behaviour in production</h2>
<p>Resilience events — retries, timeouts, circuit breaker state changes — must be visible in Application Insights. The <code>OnRetry</code>, <code>OnOpened</code>, <code>OnClosed</code>, and <code>OnHalfOpened</code> callbacks in the pipeline configuration (shown above) emit structured log entries that Serilog writes to Application Insights.</p>
<p>A KQL query that surfaces retry activity:</p>
<pre><code class="language-kusto">traces
| where timestamp &gt; ago(1h)
| where message contains "Retrying"
| extend
    Service = tostring(customDimensions["Service"]),
    Attempt = toint(customDimensions["Attempt"])
| summarize RetryCount = count() by Service, bin(timestamp, 5m)
| render timechart
</code></pre>
<p>And circuit breaker openings:</p>
<pre><code class="language-kusto">traces
| where timestamp &gt; ago(24h)
| where message contains "Circuit breaker OPENED"
| extend Service = tostring(customDimensions["Service"])
| project timestamp, Service, message
| order by timestamp desc
</code></pre>
<p>In the production system, this query was part of the daily operational review. A circuit breaker opening is always a signal worth investigating — it means an upstream service sustained a 50% failure rate for at least 30 seconds with at least 5 requests. That is not a transient blip; it is an upstream service in trouble. The circuit breaker opening is often the first observable signal of an upstream incident, arriving before alerts from the upstream team's own monitoring.</p>
<hr />
<h2>The settings that required tuning in production</h2>
<p>The initial resilience configuration used the standard handler defaults for all services. Three settings were tuned after observing production behaviour:</p>
<p><strong>Attempt timeout for the notification service was lowered from 1 second to 300ms.</strong> The notification service was consistently the slowest upstream at p95 — 280ms. With a 1-second attempt timeout and two retries, a slow notification call could hold a BFF aggregation for up to 3 seconds before returning null. At 300ms, the first slow call times out quickly, the retry fires once, and if it also times out the total operation completes in 900ms — within the acceptable budget for a non-critical service.</p>
<p><strong>Circuit breaker</strong> <code>MinimumThroughput</code> <strong>was raised from 3 to 5 for the user service.</strong> At 3 requests, a brief burst of three 500 errors — which occurred during a weekly maintenance window on the user service — opened the circuit breaker and blocked all dashboard loads for 20 seconds. Five requests provide a more stable signal that distinguishes a sustained failure from a brief transient event.</p>
<p><strong>Retry jitter was essential under load.</strong> The initial configuration used linear backoff without jitter. During a load test, all requests to the course service that encountered a 503 retried at exactly 200ms, 400ms intervals — producing a coordinated retry storm that overwhelmed the course service recovery. Adding <code>UseJitter = true</code> spread the retries across the delay window and eliminated the storm pattern.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/eaaa1c61-8bee-45f8-9518-c7aa246d7c63.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Resilience as a design conversation, not a configuration detail</h2>
<p>The configuration decisions above — which services get retries, what the timeout budgets are, where the circuit breaker thresholds sit — are not technical decisions made in isolation. They represent a negotiation between what the BFF can tolerate, what the upstream services can withstand, and what the user experience requires.</p>
<p>A retry that is safe from the BFF's perspective may be harmful from the upstream's perspective if it doubles the load during a partial outage. A timeout that is acceptable for a background operation is unacceptable for a user-facing request on the critical path. A circuit breaker threshold that is appropriate for a high-traffic service is too sensitive for a low-traffic service where three failures in 30 seconds is statistically insignificant.</p>
<p>These thresholds should be reviewed with the teams that own the upstream services, set against measured p95 latencies from production telemetry, and revisited when service characteristics change. A resilience configuration that has not been reviewed in six months is a configuration that no longer reflects the system it is protecting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/a924692b-fd44-4327-9246-476c88f28966.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li>→ <a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Caching in the BFF: In-Memory, Redis & Response Caching]]></title><description><![CDATA[Caching is one of the most compelling arguments for the BFF pattern — and one of the fastest ways to introduce subtle, hard-to-diagnose bugs if it is implemented without precision. A BFF that aggregat]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/caching-in-the-bff-in-memory-redis-response-caching</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/caching-in-the-bff-in-memory-redis-response-caching</guid><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[caching]]></category><category><![CDATA[.net core]]></category><category><![CDATA[Aspnetcore]]></category><category><![CDATA[Redis]]></category><category><![CDATA[in-memory-caching]]></category><category><![CDATA[response-caching]]></category><category><![CDATA[Cache Invalidation]]></category><category><![CDATA[performance]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Microservices]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sat, 18 Apr 2026 03:57:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/cc76e0b0-2973-461e-bb31-674d8d0f925a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>Caching is one of the most compelling arguments for the BFF pattern — and one of the fastest ways to introduce subtle, hard-to-diagnose bugs if it is implemented without precision. A BFF that aggregates four upstream services on every request has an obvious caching opportunity: responses that are expensive to assemble and infrequently changing can be served from memory rather than recomputed. But "infrequently changing" is doing significant work in that sentence, and the failure modes that emerge when caching is applied too broadly are worse than the latency it was meant to solve.</p>
<p>This article covers where caching belongs in a BFF, the three caching strategies available in .NET Core and when each is appropriate, the stale-data problems that appear in each strategy, and the cache invalidation patterns that keep the system correct under real-world usage. Code examples are grounded in the education platform BFF built throughout this series.</p>
<hr />
<h2>Where caching belongs — and where it does not</h2>
<p>Before any implementation, one architectural question determines whether your caching strategy will stay maintainable: <strong>is the data being cached user-specific, or is it shared across users?</strong></p>
<p>This distinction divides the caching problem into two fundamentally different problems that require different solutions.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/fd7fa5dc-0af7-4663-bfce-2676e2316b56.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Shared data</strong> is the same for all users or all users within an organisation: the list of available courses for an institution, the current academic calendar, lookup tables for status codes and role descriptions. This data can be cached at the BFF level with a single cache entry serving thousands of requests.</p>
<p><strong>User-specific data</strong> is different for every authenticated user: a student's enrolled courses, a teacher's upcoming sessions, a user's unread notifications. Caching this data requires a cache key that includes the user's identity. Serving one user's data to another user is a security vulnerability, not a performance optimisation.</p>
<p>The production system this series is based on had both types. Course catalogue data (available courses for an institution) was shared and changed at most once per day. User enrollment status was user-specific and could change within a single session if a student enrolled or withdrew. These two types required different cache strategies with different TTLs and different invalidation approaches.</p>
<p>A third category exists: <strong>request-level aggregation results</strong> — data that is expensive to compute for a single request but is neither shared across users nor needed beyond the current response. For this, caching is the wrong tool; parallel upstream calls (as implemented in Article 4) are the right tool. Not every performance problem is a caching problem.</p>
<hr />
<h2>The three caching strategies</h2>
<p>.NET Core provides three caching mechanisms with distinct characteristics. A production BFF typically uses all three in different parts of the stack.</p>
<ol>
<li><p>IMemoryCache — in-process memory cache IMemoryCache stores data in the BFF process's heap. It is the fastest cache available — no serialisation, no network hop — but its data is local to a single container instance and is lost on restart. Appropriate for: shared reference data with low invalidation requirements (course catalogues, academic year data, role lookup tables), in a single-instance deployment. Not appropriate for: user-specific data (cache keys must be per-user, which increases memory pressure significantly at scale), data that must be consistent across multiple BFF instances.</p>
</li>
<li><p>IDistributedCache with Redis — out-of-process distributed cache IDistributedCache backed by Redis stores serialised data outside the BFF process. Data survives container restarts, is shared across multiple BFF instances, and can be inspected and cleared independently of the BFF. Appropriate for: user-specific data at scale, shared data in multi-instance deployments, data that must survive BFF restarts. Not appropriate for: data that changes faster than the round-trip cost to Redis justifies caching (for very short TTLs, the Redis latency can approach the upstream call latency).</p>
</li>
<li><p>Response caching middleware — HTTP-level cache headers Response caching stores entire HTTP responses — headers and body — and serves them for subsequent requests that match the caching rules. It operates at the HTTP layer and can be combined with a CDN or reverse proxy that respects Cache-Control headers. Appropriate for: unauthenticated or publicly accessible BFF endpoints, endpoints returning shared data where HTTP cache semantics (ETags, If-None-Match) add value. Not appropriate for: authenticated endpoints — Cache-Control: private prevents proxy caching but also limits the usefulness of the middleware itself. For most BFF endpoints behind authentication, response caching at the HTTP layer adds complexity without benefit over IMemoryCache or Redis.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/1c2fd1cd-b398-463e-b9df-fdbdbd2e04bd.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Implementation: <code>IMemoryCache</code> for shared reference data</h2>
<p>The course catalogue for an institution — the list of available courses — changes at most once per day in the education platform. Fetching it on every dashboard request is wasteful. It is shared across all users in the same organisation, making it a natural candidate for <code>IMemoryCache</code>.</p>
<pre><code class="language-csharp">// Infrastructure/CourseCache.cs
public sealed class CourseCache(
    IMemoryCache cache,
    CourseServiceClient courseClient,
    ILogger&lt;CourseCache&gt; logger)
{
    private static string CacheKey(string orgId) =&gt; $"courses:org:{orgId}";

    private static readonly MemoryCacheEntryOptions CacheOptions = new MemoryCacheEntryOptions()
        .SetAbsoluteExpiration(TimeSpan.FromMinutes(30))
        .SetSlidingExpiration(TimeSpan.FromMinutes(10))
        .SetSize(1); // For size-limited cache — each entry counts as 1 unit

    public async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; GetOrFetchAsync(
        string orgId, CancellationToken ct = default)
    {
        var key = CacheKey(orgId);

        if (cache.TryGetValue(key, out IReadOnlyList&lt;CourseDto&gt;? cached))
        {
            logger.LogDebug(
                "Cache hit for courses. OrgId: {OrgId}, Key: {CacheKey}", orgId, key);
            return cached;
        }

        logger.LogDebug(
            "Cache miss for courses. OrgId: {OrgId}, fetching upstream.", orgId);

        var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);

        if (courses is not null)
        {
            cache.Set(key, courses, CacheOptions);
            logger.LogInformation(
                "Cached {CourseCount} courses for org {OrgId}. TTL: 30 min.",
                courses.Count, orgId);
        }

        return courses;
    }

    public void Invalidate(string orgId)
    {
        var key = CacheKey(orgId);
        cache.Remove(key);
        logger.LogInformation(
            "Cache invalidated for courses. OrgId: {OrgId}", orgId);
    }
}
</code></pre>
<p>Register and size the cache in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">// Program.cs
builder.Services.AddMemoryCache(opts =&gt;
{
    // Limit the cache to 100MB — prevents unbounded memory growth
    // as the number of institutions served grows
    opts.SizeLimit = 1024; // 1024 units — each entry is 1 unit above
    opts.CompactionPercentage = 0.25; // Remove 25% of entries when limit is hit
    opts.TrackStatistics = true; // Enables cache hit/miss metrics
});

builder.Services.AddSingleton&lt;CourseCache&gt;();
</code></pre>
<p>The <code>SetSlidingExpiration</code> combined with <code>SetAbsoluteExpiration</code> creates a TTL behaviour: entries expire 30 minutes after creation regardless of access frequency, but within that window they expire 10 minutes after the last access. For course data that changes on a daily cycle, this means a moderately active institution's data stays cached; an institution with infrequent users (a small school) does not hold stale data indefinitely.</p>
<h3>Avoiding the thundering herd</h3>
<p>When the cache for a high-traffic organisation expires, multiple concurrent requests will all get cache misses simultaneously and all make the upstream call at the same time. This is the thundering herd problem, and it is particularly acute at startup when the cache is cold.</p>
<p>The fix is a concurrent dictionary as a request coalescer:</p>
<pre><code class="language-csharp">// Infrastructure/CourseCache.cs — updated with request coalescing
public sealed class CourseCache(
    IMemoryCache cache,
    CourseServiceClient courseClient,
    ILogger&lt;CourseCache&gt; logger)
{
    // Track in-flight fetch operations — concurrent requests share one Task
    private readonly ConcurrentDictionary&lt;string, Lazy&lt;Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt;&gt;&gt;
        _inFlight = new();

    public async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; GetOrFetchAsync(
        string orgId, CancellationToken ct = default)
    {
        var key = CacheKey(orgId);

        if (cache.TryGetValue(key, out IReadOnlyList&lt;CourseDto&gt;? cached))
            return cached;

        // Coalesce concurrent cache misses — only one upstream call per key
        var lazyFetch = _inFlight.GetOrAdd(key,
            _ =&gt; new Lazy&lt;Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt;&gt;(
                () =&gt; FetchAndCacheAsync(orgId, key, ct)));

        try
        {
            return await lazyFetch.Value;
        }
        finally
        {
            _inFlight.TryRemove(key, out _);
        }
    }

    private async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; FetchAndCacheAsync(
        string orgId, string key, CancellationToken ct)
    {
        var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);
        if (courses is not null)
            cache.Set(key, courses, CacheOptions);
        return courses;
    }
}
</code></pre>
<hr />
<h2>Implementation: Redis for user-specific data</h2>
<p>User enrollment status — which courses a specific student or teacher is actively engaged with — changes within a session. A student who enrolls in a course should see that course on their next dashboard load. This rules out in-memory caching with a long TTL, but does not rule out caching entirely — a short TTL with explicit invalidation on write operations provides freshness guarantees while reducing upstream load.</p>
<p>Install the Redis client:</p>
<pre><code class="language-shell">dotnet add package Microsoft.Extensions.Caching.StackExchangeRedis
dotnet add package StackExchange.Redis
</code></pre>
<p>Configure in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">builder.Services.AddStackExchangeRedisCache(opts =&gt;
{
    opts.Configuration = builder.Configuration["Redis:ConnectionString"];
    opts.InstanceName  = "bff:"; // Prefix all keys — prevents collision with other services
});
</code></pre>
<p>The enrollment cache wrapper:</p>
<pre><code class="language-csharp">// Infrastructure/EnrollmentCache.cs
public sealed class EnrollmentCache(
    IDistributedCache cache,
    ILogger&lt;EnrollmentCache&gt; logger)
{
    private static string UserKey(string userId) =&gt; $"enrollment:{userId}";

    // Short TTL — enrollment status is user-specific and mutable within a session
    private static readonly DistributedCacheEntryOptions Options =
        new DistributedCacheEntryOptions()
            .SetAbsoluteExpirationRelativeToNow(TimeSpan.FromMinutes(5));

    public async Task&lt;EnrollmentStatusDto?&gt; GetAsync(
        string userId, CancellationToken ct = default)
    {
        var key  = UserKey(userId);
        var data = await cache.GetStringAsync(key, ct);

        if (data is null)
        {
            logger.LogDebug("Redis cache miss for enrollment. UserId: {UserId}", userId);
            return null;
        }

        logger.LogDebug("Redis cache hit for enrollment. UserId: {UserId}", userId);
        return JsonSerializer.Deserialize&lt;EnrollmentStatusDto&gt;(data);
    }

    public async Task SetAsync(
        string userId, EnrollmentStatusDto status, CancellationToken ct = default)
    {
        var key  = UserKey(userId);
        var data = JsonSerializer.Serialize(status);
        await cache.SetStringAsync(key, data, Options, ct);
    }

    public async Task InvalidateAsync(string userId, CancellationToken ct = default)
    {
        var key = UserKey(userId);
        await cache.RemoveAsync(key, ct);
        logger.LogInformation(
            "Redis cache invalidated for enrollment. UserId: {UserId}", userId);
    }
}
</code></pre>
<h3>Cache-aside pattern in the aggregator</h3>
<p>The aggregator uses the cache in a cache-aside pattern: check the cache first; on miss, fetch from upstream and populate the cache; on invalidation events, remove the entry.</p>
<pre><code class="language-csharp">// Aggregators/CourseAggregator.cs
public sealed class CourseAggregator(
    CourseCache courseCache,
    EnrollmentCache enrollmentCache,
    CourseServiceClient courseClient,
    ILogger&lt;CourseAggregator&gt; logger)
{
    public async Task&lt;CourseDetailResponse&gt; GetCourseDetailAsync(
        string courseId, string userId, CancellationToken ct = default)
    {
        // Try enrollment status from Redis cache first
        var enrollmentStatus = await enrollmentCache.GetAsync(userId, ct);

        if (enrollmentStatus is null)
        {
            logger.LogDebug(
                "Enrollment cache miss for user {UserId}, fetching upstream.", userId);
            var upstreamEnrollment = await courseClient
                .GetEnrollmentStatusAsync(userId, ct);

            if (upstreamEnrollment is not null)
            {
                enrollmentStatus = upstreamEnrollment;
                await enrollmentCache.SetAsync(userId, enrollmentStatus, ct);
            }
        }

        // Course detail from memory cache
        var courseDetail = await courseCache.GetOrFetchAsync(courseId, ct);

        return ShapeCourseDetail(courseDetail, enrollmentStatus);
    }
}
</code></pre>
<h3>Invalidating on write: enrollment changes</h3>
<p>When a user enrolls in or withdraws from a course, the BFF handles the POST/DELETE request. The enrollment cache must be invalidated immediately after the upstream write succeeds:</p>
<pre><code class="language-csharp">// Endpoints/EnrollmentEndpoints.cs
private static async Task&lt;IResult&gt; EnrollAsync(
    string courseId,
    HttpContext ctx,
    CourseServiceClient courseClient,
    EnrollmentCache enrollmentCache,
    CancellationToken ct)
{
    var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier)!;

    var result = await courseClient.EnrollAsync(courseId, userId, ct);
    if (result is null)
        return Results.Problem(
            detail: "Enrollment request could not be processed.",
            statusCode: StatusCodes.Status502BadGateway);

    // Invalidate cache immediately after successful write
    // The next GET will fetch fresh data from upstream
    await enrollmentCache.InvalidateAsync(userId, ct);

    return Results.Ok(result);
}
</code></pre>
<p>The invalidation-on-write pattern is simple and correct for a BFF architecture. It avoids the complexity of cache-through (writing to the cache as part of the write operation) and cache-update (updating the cached value without an upstream round-trip). Both of those patterns require the cache entry to always be in a valid state relative to the upstream — which is hard to guarantee when the upstream can be modified by systems other than the BFF.</p>
<hr />
<h2>The stale-data problems to watch for</h2>
<h3>1. Multiple BFF instances with in-memory caches</h3>
<p>This is the most common caching mistake in a BFF deployed to multiple container instances. Two instances of the BFF each hold their own in-memory cache. Instance A's cache is invalidated; Instance B's cache still holds the stale entry. A user's requests can round-robin between instances — they see fresh data, then stale data, then fresh data again.</p>
<p>The fix is architectural: in-memory caches should only hold data that is either truly immutable (lookup tables, static reference data that never changes within a deployment lifetime) or whose staleness is acceptable and bounded (data with a short TTL where a brief period of inconsistency between instances is acceptable). User-specific data and data with write-based invalidation must use a distributed cache.</p>
<p>In the production system, the single ACI container instance made this a non-issue during initial deployment. The in-memory course cache was fine for one instance. When the architecture was being designed for potential multi-instance scaling, the course cache was moved to Redis — not because the single instance was problematic, but because the coupling between the caching strategy and the deployment topology was too tight.</p>
<h3>2. TTL longer than the upstream change frequency</h3>
<p>A 30-minute TTL on course catalogue data is reasonable when courses change at most once per day. If the product owner changes a course's enrollment capacity during an active session, users will see the old capacity for up to 30 minutes. This is an acceptable trade-off only if it has been explicitly agreed with the product team.</p>
<p>The conversation that needs to happen: "We cache course data for 30 minutes to reduce load on the course service. This means a course change takes up to 30 minutes to appear for users. Is that acceptable, or do we need to implement a cache invalidation webhook?" Not having this conversation leads to support tickets about "the system not updating."</p>
<p>In the production system, the course catalogue TTL was set to 30 minutes after explicit agreement with the product team. A cache invalidation endpoint (covered below) was added in the second month when an administrator needed to force an immediate refresh.</p>
<h3>3. Cache key collisions</h3>
<p>A cache key that does not include all dimensions of uniqueness produces collisions. The most dangerous variant in a BFF is a key that omits the user ID or organisation ID.</p>
<pre><code class="language-csharp">// ✗ Dangerous — same key for all users
var key = $"dashboard:courses";

// ✗ Still dangerous — same key for all users in a session
var key = $"dashboard:courses:session";

// ✓ Correct — unique per organisation
var key = $"courses:org:{orgId}";

// ✓ Correct — unique per user
var key = $"enrollment:{userId}";
</code></pre>
<p>A cache key collision between two organisations is a data leak. Organisation A's course list is returned to Organisation B's users. In the production system, every cache key was reviewed against the principle: "if two different users made this same request with different identities, would this key produce different results?" If yes, both identities must be in the key.</p>
<h3>4. Caching error responses</h3>
<p>A cache implementation that stores <code>null</code> results from upstream failures and serves them as cache hits is caching the absence of data as if it were data. The fix is explicit:</p>
<pre><code class="language-csharp">// Always check whether upstream returned a valid result before caching
var courses = await courseClient.GetCoursesByOrgAsync(orgId, ct);

if (courses is not null) // Only cache successful responses
{
    cache.Set(key, courses, CacheOptions);
}
// Do not cache null — let the next request try the upstream again
return courses;
</code></pre>
<p>This is why the <code>CourseCache</code> above only calls <code>cache.Set</code> inside the <code>if (courses is not null)</code> guard. Caching a <code>null</code> result from a temporarily unavailable service would mean every request for the next 30 minutes is served a null cache hit — extending the upstream outage far beyond its actual duration from the user's perspective.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/f0445429-7b5e-4969-abbd-83c76c5a3482.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Cache invalidation patterns</h2>
<p>Cache invalidation is famously hard. The BFF has three practical patterns, each appropriate to different scenarios.</p>
<h3>Pattern 1: TTL-based expiry (simplest, eventual consistency)</h3>
<p>Set a TTL and accept the staleness window. Appropriate for shared reference data where bounded staleness is acceptable and invalidation events cannot be reliably detected.</p>
<pre><code class="language-csharp">// 30-minute TTL — course catalogue for an organisation
cache.Set(key, courses, new MemoryCacheEntryOptions()
    .SetAbsoluteExpiration(TimeSpan.FromMinutes(30)));
</code></pre>
<p>Advantages: no coordination required, predictable behaviour, simple to reason about. Disadvantages: staleness window can violate product expectations; TTL must be agreed with the product team.</p>
<h3>Pattern 2: Write-through invalidation (most reliable for user-specific data)</h3>
<p>Invalidate the cache entry immediately after any write operation that changes the cached data. The BFF already handles writes; the invalidation call is a single line after a successful upstream write.</p>
<p>Advantages: immediate consistency after writes, no separate invalidation infrastructure. Disadvantages: only covers writes that go through the BFF. If upstream data changes via a different path (direct API call, admin tool, another service), the BFF cache is not notified.</p>
<h3>Pattern 3: Invalidation endpoint (for external events)</h3>
<p>Expose a cache invalidation endpoint on the BFF that upstream services or admin tools can call when data changes. This is the pattern for "the upstream can change without going through the BFF."</p>
<pre><code class="language-csharp">// Endpoints/CacheEndpoints.cs
public static class CacheEndpoints
{
    public static IEndpointRouteBuilder MapCacheEndpoints(
        this IEndpointRouteBuilder app)
    {
        // Protected by an API key — not exposed to authenticated users
        app.MapPost("/internal/cache/invalidate/org/{orgId}",
            async (string orgId, CourseCache courseCache,
                   HttpContext ctx, IConfiguration config) =&gt;
        {
            // Validate internal API key — this endpoint is not user-facing
            var apiKey = ctx.Request.Headers["X-Internal-Api-Key"].FirstOrDefault();
            if (apiKey != config["InternalApi:Key"])
                return Results.Unauthorized();

            courseCache.Invalidate(orgId);
            return Results.Ok(new { invalidated = true, orgId });
        })
        .WithName("InvalidateCourseCache")
        .ExcludeFromDescription(); // Do not expose in OpenAPI spec

        return app;
    }
}
</code></pre>
<p>In the production system, this endpoint was called by an administrative management tool when an institution administrator updated the course catalogue. The management tool made a POST to <code>/internal/cache/invalidate/org/{orgId}</code> after its upstream write completed. The BFF's cache was cleared; the next user request fetched fresh data.</p>
<p>The endpoint is protected by a static API key rather than the standard Feide authentication — it is an internal system-to-system call, not a user action. The key is injected as a secret environment variable, identical to the Feide client secret in Article 7.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/81fa345e-2870-4f62-ba6c-13db32908093.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Measuring cache effectiveness</h2>
<p>Cache effectiveness is not self-evident without metrics. The production system tracked three numbers:</p>
<p><strong>Hit rate by cache layer.</strong> For <code>IMemoryCache</code>, the <code>MemoryCache.GetCurrentStatistics()</code> API provides hit count and miss count. Log these periodically:</p>
<pre><code class="language-csharp">// Background service — logs cache stats every 5 minutes
public sealed class CacheMetricsService(
    IMemoryCache cache,
    TelemetryClient telemetry,
    ILogger&lt;CacheMetricsService&gt; logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            await Task.Delay(TimeSpan.FromMinutes(5), ct);

            var stats = (cache as MemoryCache)?.GetCurrentStatistics();
            if (stats is null) continue;

            var hitRate = stats.TotalHits + stats.TotalMisses &gt; 0
                ? (double)stats.TotalHits / (stats.TotalHits + stats.TotalMisses) * 100
                : 0;

            logger.LogInformation(
                "MemoryCache stats — Hits: {Hits}, Misses: {Misses}, " +
                "HitRate: {HitRate:F1}%, Entries: {EntryCount}, " +
                "EstimatedSize: {Size}",
                stats.TotalHits, stats.TotalMisses,
                hitRate, stats.CurrentEntryCount, stats.CurrentEstimatedSize);

            telemetry.GetMetric("Cache.HitRate").TrackValue(hitRate);
            telemetry.GetMetric("Cache.EntryCount").TrackValue(stats.CurrentEntryCount);
        }
    }
}
</code></pre>
<p><strong>Upstream call reduction.</strong> Compare the number of BFF requests for course data with the number of actual calls the <code>CourseServiceClient</code> makes. The ratio should match the expected cache hit rate. If the BFF receives 500 dashboard requests per hour for an organisation and the course service receives 500 calls, the cache is not working. If it receives 12 calls (one per 30-minute TTL window during business hours), it is.</p>
<p><strong>Latency delta between cache hits and misses.</strong> Log the cache outcome (hit/miss) alongside the aggregation duration in the <code>DashboardAggregationCompleted</code> telemetry event from Article 9. A Kusto query that splits duration by cache outcome reveals the actual latency benefit the cache provides — which is the number that justifies its complexity.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/db349ced-4f7f-4308-8284-f72e759641eb.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>When to remove caching</h2>
<p>Caching is sometimes introduced to mask a performance problem that should be fixed at the source. Before adding a cache layer, consider whether the upstream service is slow because it is under-indexed, under-resourced, or architecturally misdesigned — and whether fixing the source would make the cache unnecessary.</p>
<p>The production system removed the enrollment status cache in its second month. The upstream enrollment service had been slow due to a missing database index. After the index was added, the p95 response time dropped from 400ms to 18ms. At 18ms, the 5-minute Redis TTL on enrollment status introduced more staleness risk than the latency it saved. The cache was removed; the enrollment service was called directly on every request.</p>
<p>This is the correct outcome. A cache that is no longer needed is a cache that is no longer producing subtle bugs. The willingness to remove caching when the underlying problem is solved is as important as the willingness to add it when it is genuinely needed.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/1f8367ec-1adb-4beb-a104-4bbbee002f71.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li>→ <a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Observability for BFF: Structured Logging, Distributed Tracing & Azure Application Insights]]></title><description><![CDATA[A note on the code in this article. The observability setup shown here is derived from a production BFF built for a Norwegian enterprise education platform. Resource names, workspace identifiers, aler]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights</guid><category><![CDATA[observability]]></category><category><![CDATA[ #StructuredLogging ]]></category><category><![CDATA[serilog]]></category><category><![CDATA[distributed tracing]]></category><category><![CDATA[correlation-id]]></category><category><![CDATA[Azure Application Insights]]></category><category><![CDATA[Application insights]]></category><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[dotnet]]></category><category><![CDATA[telemetry]]></category><category><![CDATA[monitoring]]></category><category><![CDATA[alerting]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Thu, 16 Apr 2026 14:33:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/18a3abea-37ee-4ba8-a259-4531d07e0d13.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<blockquote>
<p><strong>A note on the code in this article.</strong> The observability setup shown here is derived from a production BFF built for a Norwegian enterprise education platform. Resource names, workspace identifiers, alert thresholds, and certain query specifics have been generalised to meet NDA obligations. The Serilog configuration, Application Insights integration, custom telemetry patterns, Kusto queries, and the specific operational decisions each choice addresses are drawn directly from what was deployed and monitored in production.</p>
</blockquote>
<hr />
<p>A BFF that aggregates multiple upstream services has an observability problem that a single-service system does not. When a request to <code>GET /api/dashboard</code> returns a 503, four upstream services are potential failure points. When it takes 2.8 seconds instead of the expected 300ms, any one of the three sequential aggregation phases might be the culprit. Without end-to-end traceability — a single thread of correlation that follows a request from the Vue application's <code>fetch</code> call through every BFF aggregator method and every upstream HTTP call — diagnosing production incidents means guessing.</p>
<p>This article builds that traceability: structured logging with Serilog, distributed tracing with Activity and correlation IDs, and the Application Insights configuration that ties them together into a queryable, alertable observability layer. It then covers the dashboard and alert setup that makes the difference between discovering an incident from a user report and discovering it from a monitor.</p>
<hr />
<h2>What observability means for this architecture</h2>
<p>The request path for a dashboard load spans five distinct components:</p>
<pre><code class="language-plaintext">Vue app (browser)
  └── fetch /api/dashboard
        └── BFF (.NET Core on ACI)
              ├── UserServiceClient     → User Service
              ├── NotificationClient    → Notification Service
              ├── CourseServiceClient   → Course Service
              └── SessionServiceClient  → Session Service
</code></pre>
<p>Full observability means being able to answer these questions from a single tool:</p>
<ul>
<li><p>Which upstream service caused this request to fail or slow down?</p>
</li>
<li><p>What was the exact sequence of events for request <code>X-Correlation-Id: abc-123</code>?</p>
</li>
<li><p>Is the BFF's p95 latency within the defined budget for this week?</p>
</li>
<li><p>How many requests returned partial failures in the last 24 hours?</p>
</li>
<li><p>Did the 2am deployment degrade response times compared to before?</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/3443afe1-9f66-4111-8f90-2ea7c1150c76.png" alt="" style="display:block;margin:0 auto" />

<p>Application Insights can answer all of these — but only if the telemetry is structured correctly from the start. Unstructured logs and missing correlation IDs produce a tool that has data but cannot connect it.</p>
<hr />
<h2>Serilog: structured logging foundation</h2>
<p>Install the packages established in Article 4's <code>Program.cs</code>, plus the destructuring policy for request logging:</p>
<pre><code class="language-shell">dotnet add package Serilog.AspNetCore
dotnet add package Serilog.Sinks.ApplicationInsights
dotnet add package Serilog.Enrichers.Environment
dotnet add package Serilog.Enrichers.Process
dotnet add package Serilog.Enrichers.Thread
</code></pre>
<h3>Full Serilog configuration</h3>
<p>The Serilog configuration in <code>Program.cs</code> merges static enrichers, dynamic log context enrichers, and the Application Insights sink:</p>
<pre><code class="language-csharp">// Program.cs
builder.Host.UseSerilog((ctx, services, cfg) =&gt; cfg
    .ReadFrom.Configuration(ctx.Configuration)
    .ReadFrom.Services(services)
    .Enrich.FromLogContext()
    .Enrich.WithMachineName()
    .Enrich.WithEnvironmentName()
    .Enrich.WithProperty("Service",     "bff")
    .Enrich.WithProperty("Version",     ctx.Configuration["AppVersion"] ?? "unknown")
    .Enrich.WithProperty("Environment", ctx.HostingEnvironment.EnvironmentName)
    .WriteTo.Console(new RenderedCompactJsonFormatter())
    .WriteTo.ApplicationInsights(
        services.GetRequiredService&lt;TelemetryConfiguration&gt;(),
        TelemetryConverter.Traces,
        restrictedToMinimumLevel: LogEventLevel.Information));
</code></pre>
<p>The <code>AppVersion</code> property — injected as an environment variable in the ACI deployment — is the deployed image tag (the Git commit SHA from Article 7). Every log entry carries it. When a regression appears in Application Insights, filtering by <code>Version</code> immediately isolates whether the regression started with a specific deployment.</p>
<h3>appsettings.json: log level configuration</h3>
<pre><code class="language-json">{
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft": "Warning",
        "Microsoft.Hosting.Lifetime": "Information",
        "System": "Warning",
        "System.Net.Http": "Warning"
      }
    }
  }
}
</code></pre>
<p>The <code>System.Net.Http</code> override is important. Without it, every outgoing HTTP request from the typed clients emits verbose debug logs about connection pools, headers, and DNS resolution — the majority of which are noise in production. Setting it to <code>Warning</code> keeps the log volume manageable and the signal-to-noise ratio high.</p>
<hr />
<h2>Structured log messages: writing for queryability</h2>
<p>The difference between a log entry that helps and one that does not is almost entirely in whether its properties can be queried independently. Serilog's message template syntax — curly braces with named properties — is the mechanism.</p>
<pre><code class="language-csharp">// ✗ Unstructured — cannot be queried by orgId or courseCount
_logger.LogInformation($"Fetched {courses.Count} courses for org {orgId}");

// ✓ Structured — orgId and courseCount are queryable properties
_logger.LogInformation(
    "Fetched {CourseCount} courses for organisation {OrgId}",
    courses.Count, orgId);
</code></pre>
<p>In Application Insights, the structured version produces a <code>customDimensions</code> object with <code>CourseCount</code> and <code>OrgId</code> as named fields. A Kusto query can then aggregate course fetch counts by organisation, find organisations with unusually low course counts, or identify correlation between slow responses and specific organisations.</p>
<h3>Aggregator-level logging</h3>
<p>The aggregator logs the complete outcome of each aggregation — duration, upstream call results, partial failures — as a single structured entry:</p>
<pre><code class="language-csharp">// Aggregators/DashboardAggregator.cs
public async Task&lt;DashboardResponse&gt; AggregateAsync(
    string userId, CancellationToken ct = default)
{
    var sw = Stopwatch.StartNew();
    var partialFailures = new List&lt;string&gt;();

    using var _ = _logger.BeginScope(new Dictionary&lt;string, object&gt;
    {
        ["UserId"] = userId,
        ["AggregationType"] = "Dashboard"
    });

    _logger.LogInformation("Dashboard aggregation started for user {UserId}", userId);

    // ... aggregation logic from Article 4 ...

    sw.Stop();
    _logger.LogInformation(
        "Dashboard aggregation completed. " +
        "Duration: {DurationMs}ms, " +
        "CourseCount: {CourseCount}, " +
        "SessionCount: {SessionCount}, " +
        "NotificationCount: {NotificationCount}, " +
        "PartialFailures: {PartialFailureCount}, " +
        "FailedServices: {FailedServices}",
        sw.ElapsedMilliseconds,
        response.Courses.Count,
        response.UpcomingSessions.Count,
        response.Notifications.Count,
        partialFailures.Count,
        string.Join(",", partialFailures));

    return response;
}
</code></pre>
<p>Every aggregation produces exactly one completion log entry. In Application Insights, a query over <code>DurationMs</code> for these entries produces an accurate latency distribution for the dashboard endpoint across every upstream combination. No APM agent or custom metric is required — the structured log is the metric.</p>
<hr />
<h2>Correlation IDs: threading the trace</h2>
<p>The <code>CorrelationIdMiddleware</code> from Article 4 ensures every request has a correlation ID. The missing piece in that implementation was propagating the ID through to Application Insights so log entries and dependency calls are all linked under the same operation.</p>
<h3>Wiring the correlation ID to Application Insights operation ID</h3>
<p>Application Insights uses <code>Activity</code> from <code>System.Diagnostics</code> as its distributed tracing primitive. The <code>Activity.Current.Id</code> is the operation ID that links all telemetry for a single request. The correlation ID middleware should use this ID rather than generating its own:</p>
<pre><code class="language-csharp">// Middleware/CorrelationIdMiddleware.cs — updated
public sealed class CorrelationIdMiddleware(RequestDelegate next)
{
    private const string CorrelationIdHeader = "X-Correlation-Id";

    public async Task InvokeAsync(HttpContext ctx)
    {
        // Prefer the inbound header (set by Front Door or API client)
        // Fall back to the Activity ID created by ASP.NET Core's tracing
        var correlationId =
            ctx.Request.Headers[CorrelationIdHeader].FirstOrDefault()
            ?? Activity.Current?.Id
            ?? ctx.TraceIdentifier;

        // Set it on the current Activity so Application Insights picks it up
        Activity.Current?.SetTag("correlation.id", correlationId);

        // Add to the Serilog log context for every log entry in this request
        using (LogContext.PushProperty("CorrelationId", correlationId))
        {
            ctx.Response.Headers[CorrelationIdHeader] = correlationId;
            await next(ctx);
        }
    }
}
</code></pre>
<p>With this wiring, searching Application Insights for a specific correlation ID surfaces:</p>
<ul>
<li><p>The incoming BFF request (as a <code>request</code> telemetry item)</p>
</li>
<li><p>Every <code>LogInformation</code> / <code>LogWarning</code> entry during that request (as <code>trace</code> items)</p>
</li>
<li><p>Every outgoing HTTP call to an upstream service (as <code>dependency</code> items)</p>
</li>
</ul>
<p>All linked under the same <code>operation_Id</code>. This is the end-to-end trace.</p>
<hr />
<h2>Application Insights: SDK configuration</h2>
<p>The Application Insights SDK auto-collects request telemetry, dependency calls, and exceptions. Configure it in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">// Program.cs
builder.Services.AddApplicationInsightsTelemetry(options =&gt;
{
    options.ConnectionString =
        builder.Configuration["ApplicationInsights:ConnectionString"];
    options.EnableAdaptiveSampling = false; // Disable sampling in production BFF
    options.EnableDependencyTrackingTelemetryModule = true;
    options.EnableRequestTrackingTelemetryModule    = true;
});

// Add a telemetry initialiser to enrich every telemetry item
// with the same properties Serilog adds to log entries
builder.Services.AddSingleton&lt;ITelemetryInitializer, BffTelemetryInitializer&gt;();
</code></pre>
<pre><code class="language-csharp">// Telemetry/BffTelemetryInitializer.cs
public sealed class BffTelemetryInitializer(IConfiguration config) : ITelemetryInitializer
{
    private readonly string _version     = config["AppVersion"] ?? "unknown";
    private readonly string _environment = config["ASPNETCORE_ENVIRONMENT"] ?? "Production";

    public void Initialize(ITelemetry telemetry)
    {
        telemetry.Context.Cloud.RoleName    = "bff";
        telemetry.Context.Component.Version = _version;

        if (telemetry is ISupportProperties props)
        {
            props.Properties["Service"]     = "bff";
            props.Properties["Version"]     = _version;
            props.Properties["Environment"] = _environment;
        }
    }
}
</code></pre>
<p><code>EnableAdaptiveSampling = false</code> is a deliberate choice for a BFF. Adaptive sampling reduces telemetry volume by dropping a percentage of requests — which is appropriate for high-volume services where cost is a concern. A BFF serving an education platform with a bounded user base generates manageable telemetry volume. Disabling sampling means every request, every dependency call, and every exception is recorded — which is the correct trade-off when the primary goal is incident diagnosis rather than cost management.</p>
<p><code>RoleName = "bff"</code> ensures that in Application Insights' Application Map, the BFF node is labelled correctly and distinct from the upstream services. Without this, every service in the map appears as a generic unnamed cloud role.</p>
<hr />
<h2>Custom telemetry: tracking aggregation outcomes</h2>
<p>The SDK auto-tracks requests and dependencies. What it cannot track automatically is the business-level outcome of an aggregation — how many partial failures occurred, how long each upstream phase took, which upstream service was the slowest on a given request. Custom metrics fill this gap.</p>
<pre><code class="language-csharp">// Telemetry/AggregationTelemetry.cs
public sealed class AggregationTelemetryService(TelemetryClient telemetryClient)
{
    public void TrackDashboardAggregation(
        string userId,
        long durationMs,
        int courseCount,
        int sessionCount,
        IReadOnlyList&lt;string&gt; partialFailures)
    {
        // Custom event — queryable by name in Application Insights
        var evt = new EventTelemetry("DashboardAggregationCompleted");
        evt.Properties["UserId"]             = userId;
        evt.Properties["PartialFailures"]    = string.Join(",", partialFailures);
        evt.Properties["HasPartialFailure"]  = (partialFailures.Count &gt; 0).ToString();
        evt.Metrics["DurationMs"]            = durationMs;
        evt.Metrics["CourseCount"]           = courseCount;
        evt.Metrics["SessionCount"]          = sessionCount;
        evt.Metrics["PartialFailureCount"]   = partialFailures.Count;
        telemetryClient.TrackEvent(evt);

        // Custom metric — appears in Metrics Explorer for trending
        telemetryClient.GetMetric("DashboardAggregation.DurationMs")
            .TrackValue(durationMs);

        if (partialFailures.Count &gt; 0)
        {
            telemetryClient.GetMetric("DashboardAggregation.PartialFailures")
                .TrackValue(partialFailures.Count);

            foreach (var service in partialFailures)
            {
                var failureEvt = new EventTelemetry("UpstreamServiceFailure");
                failureEvt.Properties["Service"]    = service;
                failureEvt.Properties["Endpoint"]   = "Dashboard";
                failureEvt.Properties["UserId"]     = userId;
                telemetryClient.TrackEvent(failureEvt);
            }
        }
    }

    public IOperationHolder&lt;DependencyTelemetry&gt; TrackUpstreamPhase(
        string phaseName, string upstreamService)
    {
        var dependency = new DependencyTelemetry
        {
            Name   = $"{upstreamService} - {phaseName}",
            Type   = "BFF Aggregation Phase",
            Target = upstreamService
        };
        return telemetryClient.StartOperation(dependency);
    }
}
</code></pre>
<p>Inject and use in the aggregator:</p>
<pre><code class="language-csharp">// Aggregators/DashboardAggregator.cs — updated with custom telemetry
public async Task&lt;DashboardResponse&gt; AggregateAsync(
    string userId, CancellationToken ct = default)
{
    var sw = Stopwatch.StartNew();
    var partialFailures = new List&lt;string&gt;();

    // Phase 1: independent calls with per-phase timing
    using var phase1 = _telemetry.TrackUpstreamPhase("Phase1-Parallel", "User+Notification");
    var profileTask      = _userClient.GetProfileAsync(userId, ct);
    var notificationTask = _notificationClient.GetUnreadCountAsync(userId, ct);
    await Task.WhenAll(profileTask, notificationTask);
    phase1.Telemetry.Success = true;

    var profile = profileTask.Result;
    if (profile is null)
    {
        _telemetryService.TrackEvent("DashboardAggregationFailed",
            new Dictionary&lt;string, string&gt;
            {
                ["Reason"]  = "ProfileServiceUnavailable",
                ["UserId"]  = userId
            });
        throw new BffAggregationException("User profile service unavailable.");
    }

    // Phase 2
    using var phase2 = _telemetry.TrackUpstreamPhase("Phase2-Courses", "CourseService");
    var courses = await _courseClient.GetCoursesByOrgAsync(profile.OrgId, ct);
    phase2.Telemetry.Success = courses is not null;
    if (courses is null) partialFailures.Add("courses");

    // Phase 3
    IReadOnlyList&lt;SessionDto&gt;? sessions = null;
    if (courses is not null &amp;&amp; courses.Count &gt; 0)
    {
        using var phase3 = _telemetry.TrackUpstreamPhase("Phase3-Sessions", "SessionService");
        sessions = await _sessionClient.GetUpcomingAsync(
            courses.Select(c =&gt; c.Id).ToArray(), 3, ct);
        phase3.Telemetry.Success = sessions is not null;
        if (sessions is null) partialFailures.Add("sessions");
    }

    var response = BuildResponse(profile, courses, sessions,
        notificationTask.Result, partialFailures);

    sw.Stop();
    _telemetryService.TrackDashboardAggregation(
        userId, sw.ElapsedMilliseconds,
        response.Courses.Count, response.UpcomingSessions.Count,
        partialFailures);

    return response;
}
</code></pre>
<p>Each aggregation phase is now a named dependency item in Application Insights. The Application Map shows the BFF node, its dependency on <code>Phase1-Parallel</code>, <code>Phase2-Courses</code>, and <code>Phase3-Sessions</code>, and the duration of each phase for each request. When a dashboard load is slow, the map immediately identifies which phase — and therefore which upstream service — is responsible.</p>
<hr />
<h2>Request logging: the Serilog request pipeline</h2>
<p><code>UseSerilogRequestLogging()</code> replaces ASP.NET Core's default request logging with Serilog's structured equivalent. Configure it to include the correlation ID and response size:</p>
<pre><code class="language-csharp">// Program.cs
app.UseSerilogRequestLogging(opts =&gt;
{
    opts.EnrichDiagnosticContext = (diagCtx, httpCtx) =&gt;
    {
        diagCtx.Set("RequestHost",     httpCtx.Request.Host.Value);
        diagCtx.Set("RequestScheme",   httpCtx.Request.Scheme);
        diagCtx.Set("UserAgent",       httpCtx.Request.Headers.UserAgent.ToString());
        diagCtx.Set("CorrelationId",
            httpCtx.Response.Headers["X-Correlation-Id"].FirstOrDefault() ?? "none");

        if (httpCtx.User.Identity?.IsAuthenticated == true)
            diagCtx.Set("UserId",
                httpCtx.User.FindFirstValue(ClaimTypes.NameIdentifier));
    };

    // Suppress health probe logs — they are noise at 30s intervals
    opts.GetLevel = (httpCtx, elapsed, ex) =&gt;
    {
        if (httpCtx.Request.Path.StartsWithSegments("/health"))
            return LogEventLevel.Verbose; // Verbose is below minimum level — effectively suppressed
        if (ex is not null || httpCtx.Response.StatusCode &gt;= 500)
            return LogEventLevel.Error;
        if (httpCtx.Response.StatusCode &gt;= 400)
            return LogEventLevel.Warning;
        return LogEventLevel.Information;
    };
});
</code></pre>
<p>The health probe suppression is not cosmetic. At 30-second intervals, health probes generate 2,880 log entries per day per instance — entries that contain no operational information and inflate the Application Insights ingestion cost. Suppressing them by setting their level to <code>Verbose</code> (below the <code>Information</code> minimum) keeps the log volume meaningful.</p>
<hr />
<h2>Tracking the Vue application: browser telemetry</h2>
<p>Application Insights has a JavaScript SDK that tracks client-side page loads, AJAX requests, and exceptions. Installing it in the Vue application completes the end-to-end trace — a slow page load can now be correlated with the specific BFF request that served it.</p>
<pre><code class="language-shell">npm install @microsoft/applicationinsights-web
</code></pre>
<pre><code class="language-typescript">// src/telemetry/appInsights.ts
import { ApplicationInsights } from '@microsoft/applicationinsights-web'

export const appInsights = new ApplicationInsights({
  config: {
    connectionString: import.meta.env.VITE_APPINSIGHTS_CONNECTION_STRING,
    enableAutoRouteTracking: true,    // Track Vue Router navigations as page views
    enableCorsCorrelation: true,      // Propagate correlation headers on fetch calls
    correlationHeaderExcludedDomains: ['*.dataporten.no'], // Don't add headers to Feide
    disableFetchTracking: false,      // Track all fetch calls (BFF API calls)
    enableRequestHeaderTracking: true,
    enableResponseHeaderTracking: true
  }
})

appInsights.loadAppInsights()
</code></pre>
<pre><code class="language-typescript">// src/main.ts
import { appInsights } from './telemetry/appInsights'
import { useSessionStore } from './stores/session'

appInsights.trackPageView()

// Set the authenticated user context once the session is known
// This links all telemetry from this browser session to the user
const app = createApp(App)
app.use(pinia)
app.use(router)

const session = useSessionStore()
session.initialise().then(() =&gt; {
  if (session.profile) {
    appInsights.setAuthenticatedUserContext(
      session.profile.principalName,
      session.profile.orgId,
      true // Store in cookie for cross-session correlation
    )
  }
})

app.mount('#app')
</code></pre>
<p><code>enableCorsCorrelation: true</code> is the key setting. With this enabled, the Application Insights SDK automatically adds <code>Request-Context</code> and <code>Request-Id</code> headers to every <code>fetch</code> call the Vue application makes. The BFF receives these headers and links its server-side telemetry to the same operation ID as the browser-side telemetry. In Application Insights' end-to-end transaction view, a single operation shows the browser page load, the <code>fetch /api/dashboard</code> call, and every upstream dependency the BFF triggered — all as one unified trace.</p>
<hr />
<h2>Kusto queries: turning telemetry into answers</h2>
<p>The Application Insights data model is queried with Kusto Query Language (KQL). The following queries are the ones that were actually pinned to the production dashboard and alerted on.</p>
<h3>BFF request latency by endpoint</h3>
<pre><code class="language-kusto">requests
| where timestamp &gt; ago(24h)
| where cloud_RoleName == "bff"
| where name !contains "health"
| summarize
    p50  = percentile(duration, 50),
    p95  = percentile(duration, 95),
    p99  = percentile(duration, 99),
    count = count()
    by name
| order by p95 desc
</code></pre>
<p>This query identifies which BFF endpoints are slowest at the p95 level — the level that matters for real user experience, not average which hides tail latency.</p>
<h3>Partial failure rate over time</h3>
<pre><code class="language-kusto">customEvents
| where timestamp &gt; ago(24h)
| where name == "UpstreamServiceFailure"
| summarize failureCount = count() by
    Service = tostring(customDimensions["Service"]),
    bin(timestamp, 1h)
| render timechart
</code></pre>
<p>This is the alert query. When <code>CourseService</code> partial failures spike from 0 to 40 in an hour, the chart shows the exact moment the upstream service degraded — before any user report arrives.</p>
<h3>Aggregation duration distribution</h3>
<pre><code class="language-kusto">customEvents
| where timestamp &gt; ago(24h)
| where name == "DashboardAggregationCompleted"
| extend
    durationMs   = todouble(customMeasurements["DurationMs"]),
    hasFailure   = tobool(customDimensions["HasPartialFailure"])
| summarize
    p50  = percentile(durationMs, 50),
    p95  = percentile(durationMs, 95),
    count = count()
    by hasFailure
</code></pre>
<p>This reveals something the standard latency query does not: whether partial failures (degraded responses) are faster or slower than fully successful responses. In the production system, partial failures were consistently faster because the failed upstream service had timed out rather than returned slowly — the timeout was the signal, not the duration. This query made that visible.</p>
<h3>Error rate by status code</h3>
<pre><code class="language-kusto">requests
| where timestamp &gt; ago(1h)
| where cloud_RoleName == "bff"
| summarize count() by resultCode
| render piechart
</code></pre>
<p>A simple query, but critical for incident triage. A spike in 503s identifies a BFF-level failure (upstream services down). A spike in 401s identifies an authentication issue. A spike in 500s identifies an unhandled exception in the BFF itself.</p>
<h3>End-to-end trace for a specific correlation ID</h3>
<pre><code class="language-kusto">let correlationId = "abc-123-def-456";
union requests, dependencies, traces, exceptions
| where timestamp &gt; ago(24h)
| where operation_Id contains correlationId
    or customDimensions["CorrelationId"] == correlationId
| order by timestamp asc
| project timestamp, itemType, name, duration, success,
          message, customDimensions
</code></pre>
<p>This is the incident diagnosis query. A user reports an error at 14:32 and provides the correlation ID from the UI (the <code>traceId</code> in the <code>ErrorDisplay</code> component from Article 5). This query returns every telemetry item — request, dependency calls, log entries, exceptions — for that specific request, in chronological order.</p>
<hr />
<h2>Dashboard configuration</h2>
<p>The production Application Insights workbook had four panels pinned for daily review and incident response:</p>
<p><strong>Panel 1: Request volume and error rate (30-minute bins)</strong></p>
<pre><code class="language-kusto">requests
| where cloud_RoleName == "bff"
| where name !contains "health"
| summarize
    total     = count(),
    errors    = countif(success == false),
    errorRate = round(100.0 * countif(success == false) / count(), 2)
    by bin(timestamp, 30m)
| render timechart with (series = errorRate)
</code></pre>
<p><strong>Panel 2: Upstream service availability</strong></p>
<pre><code class="language-kusto">dependencies
| where cloud_RoleName == "bff"
| where type == "Http"
| summarize
    total   = count(),
    failed  = countif(success == false),
    failPct = round(100.0 * countif(success == false) / count(), 2)
    by target, bin(timestamp, 1h)
| where failed &gt; 0
| order by failPct desc
</code></pre>
<p><strong>Panel 3: Aggregation latency heatmap</strong></p>
<pre><code class="language-kusto">customEvents
| where name == "DashboardAggregationCompleted"
| extend durationMs = todouble(customMeasurements["DurationMs"])
| summarize count() by
    latencyBucket = case(
        durationMs &lt; 200,   "&lt; 200ms",
        durationMs &lt; 500,   "200–500ms",
        durationMs &lt; 1000,  "500ms–1s",
        durationMs &lt; 2000,  "1–2s",
        "&gt;= 2s"),
    bin(timestamp, 1h)
| render timechart
</code></pre>
<p><strong>Panel 4: Top exceptions in the last hour</strong></p>
<pre><code class="language-kusto">exceptions
| where cloud_RoleName == "bff"
| where timestamp &gt; ago(1h)
| summarize count() by type, outerMessage
| order by count_ desc
| take 10
</code></pre>
<hr />
<h2>Alerts</h2>
<p>Three alerts were active in the production system. Each was configured in Azure Monitor with an action group that sent an email and a Teams webhook notification.</p>
<h3>Alert 1: Error rate threshold</h3>
<pre><code class="language-kusto">// Fires when error rate exceeds 5% over a 5-minute window
requests
| where cloud_RoleName == "bff"
| where name !contains "health"
| where timestamp &gt; ago(5m)
| summarize
    total  = count(),
    errors = countif(success == false)
| extend errorRate = 100.0 * errors / total
| where errorRate &gt; 5
</code></pre>
<p>Threshold: 5% error rate. Evaluation frequency: every 5 minutes. Severity: 2 (High).</p>
<h3>Alert 2: Upstream service degradation</h3>
<pre><code class="language-kusto">// Fires when any upstream service fails more than 10 times in 10 minutes
customEvents
| where name == "UpstreamServiceFailure"
| where timestamp &gt; ago(10m)
| summarize failureCount = count() by Service = tostring(customDimensions["Service"])
| where failureCount &gt; 10
</code></pre>
<p>Threshold: 10 failures. Evaluation frequency: every 5 minutes. Severity: 1 (Critical).</p>
<p>This alert fires before the error rate alert in most upstream outage scenarios. When the course service goes down, the first few dozen requests produce partial failure responses (200 with <code>partialFailures: ["courses"]</code>) rather than 5xx errors. The error rate alert would not fire; this alert catches the upstream degradation regardless of whether the BFF successfully served a degraded response.</p>
<h3>Alert 3: Aggregation latency budget</h3>
<pre><code class="language-kusto">// Fires when p95 aggregation duration exceeds 1500ms over 15 minutes
customEvents
| where name == "DashboardAggregationCompleted"
| where timestamp &gt; ago(15m)
| summarize p95 = percentile(todouble(customMeasurements["DurationMs"]), 95)
| where p95 &gt; 1500
</code></pre>
<p>Threshold: 1500ms p95. Evaluation frequency: every 5 minutes. Severity: 2 (High).</p>
<p>The 1500ms threshold was derived from the production latency budget: 300ms for Phase 1, 300ms for Phase 2, 300ms for Phase 3, 600ms buffer for BFF processing and network. When p95 exceeds 1500ms, one of the upstream services is slow — the dependency telemetry identifies which one.</p>
<hr />
<h2>What the production system learned about observability</h2>
<p><strong>Health probe log suppression was added after the first week.</strong> The initial configuration logged every health probe at <code>Information</code> level. After one week of ACI deployment with 30-second probe intervals, the Application Insights log volume was 40% health probe entries. The <code>GetLevel</code> override was added in week two and immediately reduced ingestion cost and improved signal clarity.</p>
<p><strong>The</strong> <code>AppVersion</code> <strong>property on every telemetry item paid off in the third deployment.</strong> A latency regression appeared in the p95 chart after the third deployment to production. Filtering <code>customEvents</code> by <code>Version == "sha-abc123"</code> versus <code>Version == "sha-def456"</code> isolated the regression to the new deployment within two minutes. Without the version property, the investigation would have started with checking the deployment log to determine when the regression began.</p>
<p><strong>Partial failure alerts preceded user reports by an average of 12 minutes.</strong> In the three upstream service degradations that occurred during the production period, the <code>UpstreamServiceFailure</code> alert fired an average of 12 minutes before any user submitted a support ticket. In two of the three cases, the upstream team was already aware before the first user contacted support. The BFF's partial failure model — returning degraded responses rather than errors — meant users experienced degraded functionality rather than outages, reducing the severity of each incident.</p>
<p><code>setAuthenticatedUserContext</code> <strong>was not added until month two.</strong> The browser-side Application Insights setup initially did not set the authenticated user context. This meant browser telemetry could not be correlated with a specific user's session when investigating a reported issue. Adding <code>setAuthenticatedUserContext</code> after month two connected the browser page view telemetry to the same user ID used in server-side logs — making the end-to-end trace genuinely end-to-end.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/d39ac0de-ee6a-423c-b493-fb4544ab974e.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>The complete observability picture</h2>
<p>A production incident on this system — a user reports the dashboard is slow — resolves as follows:</p>
<ol>
<li><p>The user reads the <code>traceId</code> from the error display or the support team extracts it from the correlation ID in the request log.</p>
</li>
<li><p>The end-to-end trace query returns every item for that request in 3 seconds.</p>
</li>
<li><p>The dependency telemetry shows Phase 3 (Sessions) took 2,400ms — the session service was the culprit.</p>
</li>
<li><p>The upstream service failure alert fired 8 minutes before the user reported the issue.</p>
</li>
<li><p>The session service team's runbook is triggered.</p>
</li>
<li><p>Resolution.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/250bdd90-8a6f-46d8-926c-ac4f50e56fa4.png" alt="" style="display:block;margin:0 auto" />

<p>This is what observability looks like when it is designed in rather than added later. Every decision in this series — structured logs, correlation IDs, custom aggregation events, the <code>traceId</code> in the Vue error component, the <code>AppVersion</code> on every telemetry item — was made with this incident resolution flow in mind.</p>
<hr />
<h2>Series conclusion</h2>
<p>This is the ninth and final article in the core series. The BFF is designed, built, secured, deployed, tested, and observable. What was built:</p>
<ul>
<li><p>A <strong>Vue 3 frontend</strong> with a typed API layer generated from the BFF's OpenAPI spec, composables with consistent error handling, and browser telemetry connected to the server-side trace.</p>
</li>
<li><p>A <strong>.NET Core 8 BFF</strong> with Minimal API endpoints, a typed client / aggregator / contract architecture, Feide OIDC integration using the Token Handler pattern, and a layered test suite from unit through contract.</p>
</li>
<li><p>An <strong>Azure IaaS deployment</strong> with Azure Container Instances, Azure Front Door, and a full CI/CD pipeline from commit to production with a manual approval gate.</p>
</li>
<li><p>An <strong>observability layer</strong> with Serilog structured logging, distributed tracing via Activity and correlation IDs, custom Application Insights telemetry, and production-validated KQL queries and alerts.</p>
</li>
</ul>
<p>The three supplementary articles — caching strategies, brownfield migration with the Strangler Fig pattern, and resilience patterns with Polly — extend the architecture into the production scenarios most likely to arise once the core system is running. They are written to be read independently as those needs arise.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li>→ <a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Testing the BFF: Unit, Integration & Contract Tests]]></title><description><![CDATA[A note on the code in this article. The testing strategy, test fixtures, and code examples shown here are derived from a production BFF built for a Norwegian enterprise education platform. Service nam]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/testing-the-bff-unit-integration-contract-tests</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/testing-the-bff-unit-integration-contract-tests</guid><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[Testing]]></category><category><![CDATA[unit testing]]></category><category><![CDATA[Integration Testing]]></category><category><![CDATA[contract-testing]]></category><category><![CDATA[pact]]></category><category><![CDATA[WebApplicationFactory]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[Aspnetcore]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Wed, 15 Apr 2026 13:06:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/3496b06a-3e57-400f-9386-30b4c75217d4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<blockquote>
<p><strong>A note on the code in this article.</strong> The testing strategy, test fixtures, and code examples shown here are derived from a production BFF built for a Norwegian enterprise education platform. Service names, domain models, and certain structural details have been generalised to meet NDA obligations. The test architecture, WebApplicationFactory configuration, Pact contract setup, and the specific failure modes each test layer is designed to catch are drawn directly from what was written and maintained in production.</p>
</blockquote>
<hr />
<p>A BFF sits at a boundary. On one side, the Vue application depends on it for every piece of data it renders. On the other side, it depends on upstream services it does not own. Both sides of that boundary can change. Both sides have broken production in real systems that lacked the test coverage to catch the breakage before deployment.</p>
<p>The testing strategy for a BFF is therefore not the same as for an isolated service. It has to answer three distinct questions that a single testing layer cannot answer alone: does the aggregation logic produce the right output given specific inputs? does the running service behave correctly end to end with real HTTP machinery? and does the contract the BFF exposes to the Vue application stay stable as both sides evolve independently?</p>
<p>This article builds a layered answer to those three questions — unit tests for the aggregation and shaping logic, integration tests using <code>WebApplicationFactory</code> for the full HTTP pipeline, and consumer-driven contract tests using Pact to enforce the Vue-to-BFF contract at the CI level.</p>
<hr />
<h2>The testing pyramid, applied to a BFF</h2>
<p>The standard testing pyramid — many unit tests, fewer integration tests, fewest end-to-end tests — applies to BFF testing with one modification: contract tests sit alongside integration tests, not above them. They are not end-to-end tests. They are fast, isolated, and run in CI. They occupy a distinct position in the pyramid because they address a concern that neither unit nor integration tests cover: whether the two independent codebases (the BFF and the Vue application) agree on the shape of their shared interface.</p>
<pre><code class="language-plaintext">           ┌──────────────┐
           │   E2E tests  │  ← Minimal — Playwright smoke tests only
           └──────────────┘
      ┌──────────────────────┐
      │  Integration tests   │  ← WebApplicationFactory, full HTTP pipeline
      │  Contract tests      │  ← Pact, Vue↔BFF interface verification
      └──────────────────────┘
  ┌──────────────────────────────┐
  │        Unit tests            │  ← Aggregators, shaping logic, error handling
  └──────────────────────────────┘
</code></pre>
<p>Each layer catches a different class of failure. Unit tests catch logic errors in isolation. Integration tests catch wiring errors — misconfigured middleware, incorrect route mapping, broken serialisation. Contract tests catch interface drift — changes to either side that invalidate the shared contract. None of these layers is redundant with the others.</p>
<hr />
<h2>Project setup</h2>
<p>The test projects mirror the source project structure. Each concern gets its own project with its own dependencies:</p>
<pre><code class="language-shell">dotnet new xunit -n EducationPlatform.Bff.UnitTests
dotnet new xunit -n EducationPlatform.Bff.IntegrationTests
dotnet new xunit -n EducationPlatform.Bff.ContractTests

# Unit test dependencies
cd EducationPlatform.Bff.UnitTests
dotnet add reference ../EducationPlatform.Bff
dotnet add package NSubstitute
dotnet add package FluentAssertions

# Integration test dependencies
cd ../EducationPlatform.Bff.IntegrationTests
dotnet add reference ../EducationPlatform.Bff
dotnet add package Microsoft.AspNetCore.Mvc.Testing
dotnet add package NSubstitute
dotnet add package FluentAssertions

# Contract test dependencies (provider side)
cd ../EducationPlatform.Bff.ContractTests
dotnet add reference ../EducationPlatform.Bff
dotnet add package PactNet
dotnet add package Microsoft.AspNetCore.Mvc.Testing
</code></pre>
<hr />
<h2>Layer 1: Unit tests</h2>
<p>Unit tests cover the aggregation logic and response shaping in isolation. The subjects under test are the aggregators — the classes that orchestrate upstream calls, handle partial failures, and shape responses into the Vue contract. The upstream clients are substituted with <code>NSubstitute</code> mocks.</p>
<p>Testing the happy path</p>
<pre><code class="language-csharp">// EducationPlatform.Bff.UnitTests/Aggregators/DashboardAggregatorTests.cs
public class DashboardAggregatorTests
{
    private readonly UserServiceClient _userClient;
    private readonly CourseServiceClient _courseClient;
    private readonly SessionServiceClient _sessionClient;
    private readonly NotificationServiceClient _notificationClient;
    private readonly DashboardAggregator _aggregator;

    public DashboardAggregatorTests()
    {
        _userClient         = Substitute.For&lt;UserServiceClient&gt;();
        _courseClient       = Substitute.For&lt;CourseServiceClient&gt;();
        _sessionClient      = Substitute.For&lt;SessionServiceClient&gt;();
        _notificationClient = Substitute.For&lt;NotificationServiceClient&gt;();

        _aggregator = new DashboardAggregator(
            _userClient,
            _courseClient,
            _sessionClient,
            _notificationClient,
            Substitute.For&lt;ILogger&lt;DashboardAggregator&gt;&gt;());
    }

    [Fact]
    public async Task AggregateAsync_AllServicesAvailable_ReturnsMappedResponse()
    {
        // Arrange
        const string userId = "ingrid.solberg@skole.no";

        _userClient.GetProfileAsync(userId, Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto(
                "Ingrid", "Solberg",
                OrgId: "uninett",
                RoleCode: "TEACHER",
                AvatarPath: "i-solberg.jpg"));

        _notificationClient.GetUnreadCountAsync(userId, Arg.Any&lt;CancellationToken&gt;())
            .Returns(3);

        _courseClient.GetCoursesByOrgAsync("uninett", Arg.Any&lt;CancellationToken&gt;())
            .Returns([
                new CourseDto("c-1",
                    new CourseMetadataDto("Mathematics — Year 9", "MATH-9", null!),
                    new EnrollmentDto(30, 24, 0),
                    new CourseStatusDto("ACTIVE", DateTimeOffset.UtcNow))
            ]);

        _sessionClient.GetUpcomingAsync(
                Arg.Is&lt;string[]&gt;(ids =&gt; ids.Contains("c-1")),
                3,
                Arg.Any&lt;CancellationToken&gt;())
            .Returns([
                new SessionDto("s-1", "Integration review",
                    DateTimeOffset.Parse("2025-04-08T09:00:00"),
                    "Mathematics — Year 9", Room: "204")
            ]);

        // Act
        var result = await _aggregator.AggregateAsync(userId);

        // Assert
        result.User.DisplayName.Should().Be("Ingrid Solberg");
        result.User.Role.Should().Be("Teacher");
        result.User.AvatarUrl.Should().Be("/media/avatars/i-solberg.jpg");

        result.Courses.Should().HaveCount(1);
        result.Courses[0].Title.Should().Be("Mathematics — Year 9");
        result.Courses[0].EnrollmentLabel.Should().Be("24 / 30");
        result.Courses[0].EnrollmentPercent.Should().Be(80);
        result.Courses[0].Status.Should().Be("Active");

        result.UpcomingSessions.Should().HaveCount(1);
        result.UpcomingSessions[0].LocationLabel.Should().Be("Room 204");

        result.Notifications.Count.Should().Be(3);
        result.PartialFailures.Should().BeEmpty();
    }
}
</code></pre>
<h3>Testing partial failure handling</h3>
<p>This is the test the happy-path test cannot catch: what happens when an upstream service returns null — either due to unavailability or a resilience handler exhausting its retries.</p>
<pre><code class="language-csharp">[Fact]
public async Task AggregateAsync_CourseServiceUnavailable_ReturnsDegradedResponse()
{
    // Arrange
    const string userId = "ingrid.solberg@skole.no";

    _userClient.GetProfileAsync(userId, Arg.Any&lt;CancellationToken&gt;())
        .Returns(new UserProfileDto("Ingrid", "Solberg", "uninett", "TEACHER", null));

    _notificationClient.GetUnreadCountAsync(userId, Arg.Any&lt;CancellationToken&gt;())
        .Returns(0);

    // Course service unavailable — client returns null
    _courseClient.GetCoursesByOrgAsync("uninett", Arg.Any&lt;CancellationToken&gt;())
        .Returns((IReadOnlyList&lt;CourseDto&gt;?)null);

    // Act
    var result = await _aggregator.AggregateAsync(userId);

    // Assert — response is still structurally valid
    result.Should().NotBeNull();
    result.User.DisplayName.Should().Be("Ingrid Solberg");
    result.Courses.Should().BeEmpty();
    result.UpcomingSessions.Should().BeEmpty(); // No courses → no sessions fetched
    result.PartialFailures.Should().Contain("courses");

    // Session service should not have been called — no course IDs to query with
    await _sessionClient.DidNotReceive()
        .GetUpcomingAsync(Arg.Any&lt;string[]&gt;(), Arg.Any&lt;int&gt;(), Arg.Any&lt;CancellationToken&gt;());
}

[Fact]
public async Task AggregateAsync_ProfileServiceUnavailable_ThrowsBffAggregationException()
{
    // Arrange
    _userClient.GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns((UserProfileDto?)null);

    _notificationClient.GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns(0);

    // Act &amp; Assert — profile is required; its absence is a hard failure
    await _aggregator.Invoking(a =&gt; a.AggregateAsync("any-user"))
        .Should().ThrowAsync&lt;BffAggregationException&gt;()
        .WithMessage("*User profile service unavailable*");
}
</code></pre>
<h3>Testing response shaping</h3>
<p>Shaping logic deserves its own tests, independent of aggregation orchestration. The shaping methods were private in the production aggregator — testing them through the public <code>AggregateAsync</code> method is correct, but targeted shaping tests are faster to write and easier to read when verifying edge cases:</p>
<pre><code class="language-csharp">[Theory]
[InlineData(30, 30, 100)]
[InlineData(0,  30, 0)]
[InlineData(24, 30, 80)]
[InlineData(1,  3,  33)]   // Rounds correctly
[InlineData(0,  0,  0)]    // Zero capacity — no division by zero
public async Task AggregateAsync_EnrollmentPercent_CalculatesCorrectly(
    int enrolled, int capacity, int expectedPercent)
{
    // Arrange
    SetupValidProfile("test-user");
    _notificationClient.GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns(0);
    _courseClient.GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns([new CourseDto("c-1",
            new CourseMetadataDto("Test Course", "TC-1", null!),
            new EnrollmentDto(capacity, enrolled, 0),
            new CourseStatusDto("ACTIVE", DateTimeOffset.UtcNow))]);
    _sessionClient.GetUpcomingAsync(Arg.Any&lt;string[]&gt;(), Arg.Any&lt;int&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns([]);

    // Act
    var result = await _aggregator.AggregateAsync("test-user");

    // Assert
    result.Courses[0].EnrollmentPercent.Should().Be(expectedPercent);
}

[Theory]
[InlineData("TEACHER",  "Teacher")]
[InlineData("STUDENT",  "Student")]
[InlineData("ADMIN",    "Administrator")]
[InlineData("UNKNOWN",  "Unknown")]
[InlineData("",         "Unknown")]
public async Task AggregateAsync_RoleTranslation_MapsCorrectly(
    string roleCode, string expectedRole)
{
    SetupValidProfile("test-user", roleCode: roleCode);
    _notificationClient.GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns(0);
    _courseClient.GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
        .Returns([]);

    var result = await _aggregator.AggregateAsync("test-user");

    result.User.Role.Should().Be(expectedRole);
}

private void SetupValidProfile(string userId, string roleCode = "TEACHER") =&gt;
    _userClient.GetProfileAsync(userId, Arg.Any&lt;CancellationToken&gt;())
        .Returns(new UserProfileDto("Test", "User", "uninett", roleCode, null));
</code></pre>
<p>The <code>[Theory]</code> / <code>[InlineData]</code> pattern is the right tool for shaping edge cases — it tests the same logic with multiple inputs without duplicating test structure. The division-by-zero case (<code>0 / 0</code>) is the one that caused a production exception in an early deployment; it is now a first-class test case.</p>
<hr />
<h2>Layer 2: Integration tests with WebApplicationFactory</h2>
<p>Integration tests verify the full HTTP pipeline: middleware execution order, route mapping, authentication enforcement, serialisation, and error handling. They use <code>WebApplicationFactory&lt;Program&gt;</code> from <code>Microsoft.AspNetCore.Mvc.Testing</code>, which hosts the real application in memory with substituted dependencies.</p>
<h3>The test factory</h3>
<pre><code class="language-csharp">// EducationPlatform.Bff.IntegrationTests/BffWebApplicationFactory.cs
public sealed class BffWebApplicationFactory : WebApplicationFactory&lt;Program&gt;
{
    public UserServiceClient UserClient { get; } =
        Substitute.For&lt;UserServiceClient&gt;();
    public CourseServiceClient CourseClient { get; } =
        Substitute.For&lt;CourseServiceClient&gt;();
    public SessionServiceClient SessionClient { get; } =
        Substitute.For&lt;SessionServiceClient&gt;();
    public NotificationServiceClient NotificationClient { get; } =
        Substitute.For&lt;NotificationServiceClient&gt;();

    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        builder.ConfigureTestServices(services =&gt;
        {
            // Replace real typed clients with substitutes
            services.RemoveAll&lt;UserServiceClient&gt;();
            services.RemoveAll&lt;CourseServiceClient&gt;();
            services.RemoveAll&lt;SessionServiceClient&gt;();
            services.RemoveAll&lt;NotificationServiceClient&gt;();

            services.AddSingleton(UserClient);
            services.AddSingleton(CourseClient);
            services.AddSingleton(SessionClient);
            services.AddSingleton(NotificationClient);

            // Replace real Data Protection with ephemeral keys for tests
            services.AddDataProtection()
                .UseEphemeralDataProtectionProvider();

            // Use test authentication — bypass real Feide OIDC
            services.AddAuthentication("Test")
                .AddScheme&lt;AuthenticationSchemeOptions, TestAuthHandler&gt;(
                    "Test", _ =&gt; { });
        });

        builder.UseEnvironment("Testing");
    }
}
</code></pre>
<p>The <code>TestAuthHandler</code> simulates an authenticated user without going through the Feide OIDC flow:</p>
<pre><code class="language-csharp">// EducationPlatform.Bff.IntegrationTests/TestAuthHandler.cs
public sealed class TestAuthHandler(
    IOptionsMonitor&lt;AuthenticationSchemeOptions&gt; options,
    ILoggerFactory logger,
    UrlEncoder encoder)
    : AuthenticationHandler&lt;AuthenticationSchemeOptions&gt;(options, logger, encoder)
{
    public const string UserId = "ingrid.solberg@skole.no";
    public const string OrgId  = "uninett";

    protected override Task&lt;AuthenticateResult&gt; HandleAuthenticateAsync()
    {
        // Check for the test auth header — allows tests to simulate unauthenticated requests
        if (!Request.Headers.ContainsKey("X-Test-Auth"))
            return Task.FromResult(AuthenticateResult.Fail("No test auth header."));

        var claims = new[]
        {
            new Claim(ClaimTypes.NameIdentifier, UserId),
            new Claim(ClaimTypes.Name, "Ingrid Solberg"),
            new Claim(ClaimTypes.Email, "ingrid.solberg@skole.no"),
            new Claim("feidePersonPrincipalName", UserId),
            new Claim("eduPersonPrimaryAffiliation", "staff"),
            new Claim("eduPersonOrgDN", $"dc={OrgId},dc=no")
        };

        var identity  = new ClaimsIdentity(claims, "Test");
        var principal = new ClaimsPrincipal(identity);
        var ticket    = new AuthenticationTicket(principal, "Test");

        return Task.FromResult(AuthenticateResult.Success(ticket));
    }
}
</code></pre>
<p>Using a header (<code>X-Test-Auth</code>) to trigger test authentication allows the same factory to test both authenticated and unauthenticated scenarios without changing the factory configuration between tests.</p>
<h3>Testing the dashboard endpoint</h3>
<pre><code class="language-csharp">// EducationPlatform.Bff.IntegrationTests/Endpoints/DashboardEndpointTests.cs
public class DashboardEndpointTests(BffWebApplicationFactory factory)
    : IClassFixture&lt;BffWebApplicationFactory&gt;
{
    private HttpClient CreateAuthenticatedClient() =&gt;
        factory.CreateClient(new WebApplicationFactoryClientOptions
        {
            AllowAutoRedirect = false
        })
        .WithDefaultRequestHeaders(h =&gt; h.Add("X-Test-Auth", "true"));

    [Fact]
    public async Task GET_Dashboard_AuthenticatedUser_Returns200WithCorrectShape()
    {
        // Arrange
        factory.UserClient
            .GetProfileAsync(TestAuthHandler.UserId, Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg",
                TestAuthHandler.OrgId, "TEACHER", null));

        factory.NotificationClient
            .GetUnreadCountAsync(TestAuthHandler.UserId, Arg.Any&lt;CancellationToken&gt;())
            .Returns(2);

        factory.CourseClient
            .GetCoursesByOrgAsync(TestAuthHandler.OrgId, Arg.Any&lt;CancellationToken&gt;())
            .Returns([]);

        var client = CreateAuthenticatedClient();

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.OK);

        var body = await response.Content.ReadFromJsonAsync&lt;DashboardResponse&gt;();
        body.Should().NotBeNull();
        body!.User.DisplayName.Should().Be("Ingrid Solberg");
        body.Notifications.Count.Should().Be(2);
        body.Courses.Should().BeEmpty();
        body.PartialFailures.Should().BeEmpty();
    }

    [Fact]
    public async Task GET_Dashboard_UnauthenticatedRequest_Returns401()
    {
        // Arrange — client without X-Test-Auth header
        var client = factory.CreateClient(new WebApplicationFactoryClientOptions
        {
            AllowAutoRedirect = false
        });

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.Unauthorized);
    }

    [Fact]
    public async Task GET_Dashboard_ProfileServiceDown_Returns503WithProblemDetails()
    {
        // Arrange — profile service returns null (upstream unavailable)
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((UserProfileDto?)null);

        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);

        var client = CreateAuthenticatedClient();

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert
        response.StatusCode.Should().Be(HttpStatusCode.ServiceUnavailable);

        var problem = await response.Content
            .ReadFromJsonAsync&lt;ProblemDetails&gt;();
        problem!.Title.Should().Be("Upstream service unavailable");
        problem.Status.Should().Be(503);
    }

    [Fact]
    public async Task GET_Dashboard_CourseServiceDown_Returns200WithPartialFailure()
    {
        // Arrange — course service returns null, but profile is available
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg",
                TestAuthHandler.OrgId, "TEACHER", null));

        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);

        factory.CourseClient
            .GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns((IReadOnlyList&lt;CourseDto&gt;?)null);

        var client = CreateAuthenticatedClient();

        // Act
        var response = await client.GetAsync("/api/dashboard");

        // Assert — still a 200, not a 503
        response.StatusCode.Should().Be(HttpStatusCode.OK);

        var body = await response.Content.ReadFromJsonAsync&lt;DashboardResponse&gt;();
        body!.Courses.Should().BeEmpty();
        body.PartialFailures.Should().Contain("courses");
    }

    [Fact]
    public async Task GET_Dashboard_ResponseContainsCorrelationIdHeader()
    {
        factory.UserClient
            .GetProfileAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg",
                TestAuthHandler.OrgId, "TEACHER", null));
        factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(0);
        factory.CourseClient
            .GetCoursesByOrgAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns([]);

        var client = CreateAuthenticatedClient();
        var requestCorrelationId = Guid.NewGuid().ToString();
        client.DefaultRequestHeaders.Add("X-Correlation-Id", requestCorrelationId);

        var response = await client.GetAsync("/api/dashboard");

        // The same correlation ID must be echoed back in the response
        response.Headers.Should().ContainKey("X-Correlation-Id");
        response.Headers.GetValues("X-Correlation-Id").First()
            .Should().Be(requestCorrelationId);
    }
}
</code></pre>
<p>The correlation ID test is worth calling out. It is an integration test concern, not a unit test concern — it verifies that the middleware is registered in the pipeline and executes correctly. A unit test of <code>CorrelationIdMiddleware</code> in isolation would verify the logic; this test verifies the middleware is actually wired into the application.</p>
<hr />
<h2>Layer 3: Consumer-driven contract tests with Pact</h2>
<p>Contract tests address a problem that neither unit nor integration tests can: the two independent codebases that share an interface — the Vue application and the BFF — can each pass their own tests while simultaneously breaking the other's expectations.</p>
<p>Consumer-driven contract testing (CDCT) with Pact inverts the usual testing direction. The consumer (Vue application) defines what it expects from the provider (BFF) in a contract file — a Pact. The provider runs that contract against its actual implementation and verifies it is satisfied. Neither side needs to run the other's tests. The contract is the shared artefact.</p>
<h3>Consumer side: generating the Pact in Vue</h3>
<p>The consumer test runs in the Vue application's test suite using <code>@pact-foundation/pact</code>. It verifies that the Vue composables consume the BFF's response correctly, and in doing so, generates a Pact file describing exactly what the BFF must return.</p>
<pre><code class="language-typescript">// frontend/tests/contracts/dashboard.pact.spec.ts
import { PactV3, MatchersV3 } from '@pact-foundation/pact'
import path from 'path'
import { apiClient } from '@/api/client'
import type { DashboardResponse } from '@/api/types'

const { like, eachLike, string, integer, boolean } = MatchersV3

const provider = new PactV3({
  consumer: 'education-platform-vue',
  provider: 'education-platform-bff',
  dir: path.resolve(__dirname, '../../pacts'),
  port: 4321
})

describe('Dashboard contract', () =&gt; {
  it('returns a valid dashboard response for an authenticated user', async () =&gt; {
    await provider
      .addInteraction({
        states: [{ description: 'user ingrid.solberg@skole.no exists with courses' }],
        uponReceiving: 'a GET request for the dashboard',
        withRequest: {
          method: 'GET',
          path: '/api/dashboard',
          headers: { Accept: 'application/json' }
        },
        willRespondWith: {
          status: 200,
          headers: { 'Content-Type': 'application/json' },
          body: {
            user: like({
              displayName: string('Ingrid Solberg'),
              role:        string('Teacher'),
              avatarUrl:   string('/media/avatars/i-solberg.jpg')
            }),
            courses: eachLike({
              id:                string('c-1'),
              title:             string('Mathematics — Year 9'),
              code:              string('MATH-9'),
              enrollmentLabel:   string('24 / 30'),
              enrollmentPercent: integer(80),
              status:            string('Active')
            }),
            upcomingSessions: eachLike({
              id:            string('s-1'),
              title:         string('Integration review'),
              startsAt:      string('2025-04-08T09:00:00'),
              courseTitle:   string('Mathematics — Year 9'),
              locationLabel: string('Room 204')
            }),
            notifications: like({
              count: integer(3)
            }),
            partialFailures: []
          }
        }
      })
      .executeTest(async mockServer =&gt; {
        // Point the API client at the Pact mock server
        const original = globalThis.fetch
        globalThis.fetch = (input, init) =&gt; {
          const url = typeof input === 'string'
            ? input.replace('/api', mockServer.url)
            : input
          return original(url, init)
        }

        const data = await apiClient.get&lt;DashboardResponse&gt;('/dashboard')

        // Verify the composable can handle the response
        expect(data.user.displayName).toBeTruthy()
        expect(data.courses).toBeInstanceOf(Array)
        expect(data.notifications.count).toBeGreaterThanOrEqual(0)
        expect(data.partialFailures).toBeInstanceOf(Array)
      })
  })

  it('returns 401 for unauthenticated requests', async () =&gt; {
    await provider
      .addInteraction({
        states: [{ description: 'no authenticated session' }],
        uponReceiving: 'an unauthenticated GET request for the dashboard',
        withRequest: {
          method: 'GET',
          path: '/api/dashboard'
        },
        willRespondWith: {
          status: 401,
          body: like({
            title:  string('Unauthorized'),
            status: integer(401)
          })
        }
      })
      .executeTest(async mockServer =&gt; {
        const original = globalThis.fetch
        globalThis.fetch = (input, init) =&gt;
          original(typeof input === 'string'
            ? input.replace('/api', mockServer.url) : input, init)

        await expect(
          apiClient.get&lt;DashboardResponse&gt;('/dashboard')
        ).rejects.toMatchObject({ status: 401 })
      })
  })
})
</code></pre>
<p>Running this test suite generates a Pact file at <code>pacts/education-platform-vue-education-platform-bff.json</code>. This file is the contract — the formal record of what the Vue application expects.</p>
<h3>Provider side: verifying the Pact in the BFF</h3>
<p>The BFF verifies the generated Pact against its actual implementation using <code>PactNet</code> and <code>WebApplicationFactory</code>. The verification spins up the real BFF (with substituted upstream clients), executes each interaction defined in the Pact, and asserts the response matches the contract.</p>
<pre><code class="language-csharp">// EducationPlatform.Bff.ContractTests/DashboardPactProviderTests.cs
public class DashboardPactProviderTests : IClassFixture&lt;BffWebApplicationFactory&gt;
{
    private readonly BffWebApplicationFactory _factory;
    private readonly ITestOutputHelper _output;

    public DashboardPactProviderTests(
        BffWebApplicationFactory factory,
        ITestOutputHelper output)
    {
        _factory = factory;
        _output  = output;
    }

    [Fact]
    public async Task BFF_SatisfiesVueConsumerContract()
    {
        // Arrange — set up state handlers to satisfy Pact provider states
        SetupProviderStates();

        // Start the BFF on a random port using WebApplicationFactory
        var server = _factory.Server;
        server.BaseAddress = new Uri("http://localhost");

        var config = new PactVerifierConfig
        {
            Outputters = [new XUnitOutput(_output)],
            LogLevel    = PactLogLevel.Information
        };

        var pactPath = Path.Combine(
            Directory.GetCurrentDirectory(),
            "..", "..", "..", "..", "..", // Navigate to repo root
            "frontend", "pacts",
            "education-platform-vue-education-platform-bff.json");

        // Act &amp; Assert
        await new PactVerifier("education-platform-bff", config)
            .WithHttpEndpoint(server.BaseAddress)
            .WithFileSource(new FileInfo(pactPath))
            .WithProviderStateUrl(new Uri(server.BaseAddress, "/pact/provider-states"))
            .VerifyAsync();
    }

    private void SetupProviderStates()
    {
        // "user ingrid.solberg@skole.no exists with courses"
        _factory.UserClient
            .GetProfileAsync("ingrid.solberg@skole.no", Arg.Any&lt;CancellationToken&gt;())
            .Returns(new UserProfileDto("Ingrid", "Solberg", "uninett", "TEACHER",
                "i-solberg.jpg"));

        _factory.CourseClient
            .GetCoursesByOrgAsync("uninett", Arg.Any&lt;CancellationToken&gt;())
            .Returns([
                new CourseDto("c-1",
                    new CourseMetadataDto("Mathematics — Year 9", "MATH-9", null!),
                    new EnrollmentDto(30, 24, 0),
                    new CourseStatusDto("ACTIVE", DateTimeOffset.UtcNow))
            ]);

        _factory.SessionClient
            .GetUpcomingAsync(Arg.Any&lt;string[]&gt;(), 3, Arg.Any&lt;CancellationToken&gt;())
            .Returns([
                new SessionDto("s-1", "Integration review",
                    DateTimeOffset.Parse("2025-04-08T09:00:00"),
                    "Mathematics — Year 9", Room: "204")
            ]);

        _factory.NotificationClient
            .GetUnreadCountAsync(Arg.Any&lt;string&gt;(), Arg.Any&lt;CancellationToken&gt;())
            .Returns(3);

        // "no authenticated session" — no setup needed; TestAuthFactory
        // returns 401 for requests without the X-Test-Auth header
    }
}
</code></pre>
<p>The provider state endpoint is a lightweight Minimal API registered in the test factory that the Pact verifier calls before each interaction to set up the correct data state:</p>
<pre><code class="language-csharp">// BffWebApplicationFactory.cs — add provider state endpoint
protected override void ConfigureWebHost(IWebHostBuilder builder)
{
    builder.ConfigureTestServices(services =&gt; { /* ... existing setup ... */ });

    builder.Configure(app =&gt;
    {
        // Provider states endpoint — only registered in test environment
        app.Map("/pact/provider-states", stateApp =&gt;
        {
            stateApp.Run(async ctx =&gt;
            {
                var body = await ctx.Request.ReadFromJsonAsync&lt;ProviderStateRequest&gt;();
                // State setup is handled by the substitute configuration above
                // This endpoint just acknowledges the state transition
                ctx.Response.StatusCode = 200;
                await ctx.Response.WriteAsJsonAsync(new { acknowledged = true });
            });
        });

        app.UseRouting();
        app.UseAuthentication();
        app.UseAuthorization();
        app.UseEndpoints(endpoints =&gt; endpoints.MapControllers());
    });
}

private sealed record ProviderStateRequest(string State, string Consumer);
</code></pre>
<h3>The CI workflow for contract tests</h3>
<p>The Pact file flows through CI as an artefact. The Vue consumer tests generate it; the BFF provider tests verify it:</p>
<pre><code class="language-yaml"># .github/workflows/contract-tests.yml
name: Contract Tests

on: [push, pull_request]

jobs:
  consumer-generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - name: Install dependencies
        working-directory: ./frontend
        run: npm ci
      - name: Run consumer contract tests
        working-directory: ./frontend
        run: npm run test:contracts
      - name: Upload Pact file
        uses: actions/upload-artifact@v4
        with:
          name: pact-files
          path: frontend/pacts/*.json

  provider-verify:
    runs-on: ubuntu-latest
    needs: consumer-generate
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with: { dotnet-version: '8.0.x' }
      - name: Download Pact file
        uses: actions/download-artifact@v4
        with:
          name: pact-files
          path: frontend/pacts
      - name: Run provider verification
        run: dotnet test EducationPlatform.Bff.ContractTests
</code></pre>
<p>The <code>provider-verify</code> job depends on <code>consumer-generate</code>. If the Vue consumer tests fail to generate a valid Pact file, the provider verification never runs. If the BFF's implementation no longer satisfies the Pact, the provider verification fails and the build is blocked. The contract mismatch surfaces in CI before either codebase reaches staging.</p>
<hr />
<h2>What the production system learned about testing</h2>
<p><strong>Integration tests caught the</strong> <code>OnRedirectToLogin</code> <strong>bug before it reached staging.</strong> The test <code>GET_Dashboard_UnauthenticatedRequest_Returns401</code> was written after the bug was discovered in local development. Its presence in the integration suite meant it was caught in CI on every subsequent pull request. Tests written in response to bugs are the highest-return tests in a suite.</p>
<p><strong>Pact contract tests caught a field rename.</strong> In the second month of the project, the BFF renamed <code>avatarUrl</code> to <code>profileImageUrl</code> in a refactor. The TypeScript type check in the Vue application caught it for components that referenced the field directly. The Pact consumer test caught it for the contract — the generated Pact still specified <code>avatarUrl</code>, and the BFF provider verification failed because the response now contained <code>profileImageUrl</code>. The fix was a conscious decision: keep <code>avatarUrl</code> for backward compatibility, add <code>profileImageUrl</code> as an alias, then migrate in a subsequent release.</p>
<p><strong>The</strong> <code>[Theory]</code> <strong>/</strong> <code>[InlineData]</code> <strong>pattern found the division-by-zero bug.</strong> The enrollment percentage calculation was tested only for the typical case initially. Adding <code>[InlineData(0, 0, 0)]</code> to the theory immediately failed — the implementation did not guard against zero capacity. The guard was added in the same commit. Without the theory, this would have been a production error on the first course with no enrollment configured.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/17f8a218-f515-4c5f-a88e-ddda2aef23df.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What comes next</h2>
<p>The final article in the core series brings the operational picture together: structured logging, distributed tracing, and Azure Application Insights — how to make the running BFF observable, how to connect the traces from Vue through the BFF to upstream services, and what the dashboard configuration looks like when you need to diagnose an incident at 2am.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li>→ <a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Shipping BFF to Azure: Docker Images, Artifact Publishing & Azure Container Instances]]></title><description><![CDATA[A note on the code in this article. The pipeline configuration, Dockerfile, and infrastructure definitions shown here are derived from a production deployment built for a Norwegian enterprise educatio]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances</guid><category><![CDATA[dotnet]]></category><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Dockerfile]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Azure container registry]]></category><category><![CDATA[Azure Container Instances]]></category><category><![CDATA[azure_frontdoor]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[containers]]></category><category><![CDATA[Devops]]></category><category><![CDATA[deployment]]></category><category><![CDATA[IaaS]]></category><category><![CDATA[APIM]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Tue, 14 Apr 2026 13:23:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/55b3e8a7-eede-4a9c-958f-d317b61d3aaf.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<blockquote>
<p><strong>A note on the code in this article.</strong> The pipeline configuration, Dockerfile, and infrastructure definitions shown here are derived from a production deployment built for a Norwegian enterprise education platform. Registry names, resource group identifiers, subscription IDs, and certain environment-specific configuration values have been generalised to meet NDA obligations. The deployment strategy, container configuration, secret management approach, and the specific operational decisions each choice addresses are drawn directly from what was deployed and operated in production.</p>
</blockquote>
<hr />
<p>The BFF is built. Authentication works. The Vue application consumes the API layer cleanly. What remains is getting all of it into production reliably, repeatedly, and without the kind of manual steps that turn deployments into incidents.</p>
<p>This article covers the full deployment pipeline: writing a production-grade Dockerfile for the .NET Core BFF, building and tagging images in CI, pushing to Azure Container Registry, and running the service on Azure Container Instances. It then covers the routing layer — Azure Front Door in front of the BFF — and addresses the APIM question directly: when it adds genuine value and when it adds cost without benefit.</p>
<p>The deployment approach is IaaS rather than PaaS. Azure Container Instances was chosen over App Service because the production system needed predictable container isolation, direct control over the runtime environment, and a deployment model where the exact image that passed CI is the exact image running in production. ACI provides all three without the operational overhead of a full Kubernetes cluster.</p>
<hr />
<h2>The Dockerfile</h2>
<p>The BFF Dockerfile uses a multi-stage build. The first stage compiles and publishes the application. The second stage runs it. The published output from the first stage is the only thing copied into the final image — build tools, SDK, and intermediate files stay out of the production image entirely.</p>
<pre><code class="language-dockerfile"># Dockerfile

# ── Stage 1: Build ────────────────────────────────────────────────────────────
FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine AS build
WORKDIR /src

# Copy project file and restore dependencies separately from source
# This layer is cached as long as the .csproj does not change
COPY ["EducationPlatform.Bff/EducationPlatform.Bff.csproj", "EducationPlatform.Bff/"]
RUN dotnet restore "EducationPlatform.Bff/EducationPlatform.Bff.csproj" \
    --runtime linux-musl-x64

# Copy source and publish
COPY . .
WORKDIR "/src/EducationPlatform.Bff"
RUN dotnet publish "EducationPlatform.Bff.csproj" \
    --configuration Release \
    --runtime linux-musl-x64 \
    --self-contained true \
    --output /app/publish \
    -p:PublishSingleFile=true \
    -p:PublishTrimmed=true

# ── Stage 2: Runtime ──────────────────────────────────────────────────────────
FROM mcr.microsoft.com/dotnet/runtime-deps:8.0-alpine AS runtime
WORKDIR /app

# Create non-root user — never run production containers as root
RUN addgroup -S bff &amp;&amp; adduser -S bff -G bff
USER bff

# Copy only the published output from the build stage
COPY --from=build --chown=bff:bff /app/publish .

# Health check — ACI uses this to determine container readiness
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health/live || exit 1

EXPOSE 8080
ENV ASPNETCORE_URLS=http://+:8080

ENTRYPOINT ["./EducationPlatform.Bff"]
</code></pre>
<p>Several decisions here warrant explanation.</p>
<p><strong>Alpine base with</strong> <code>linux-musl-x64</code> <strong>runtime and</strong> <code>--self-contained true</code><strong>.</strong> The Alpine image is significantly smaller than the default Debian-based image — the final runtime image sits around 90MB rather than 300MB. Self-contained publishing includes the .NET runtime in the output, which means the runtime image does not need a .NET runtime layer at all. The <code>runtime-deps</code> base image provides only the native dependencies that a self-contained .NET binary requires.</p>
<p><code>PublishSingleFile=true</code> <strong>and</strong> <code>PublishTrimmed=true</code><strong>.</strong> Single-file publishing packages the application and its dependencies into one executable. Trimming removes unused framework code from the output. Together they reduce the published output to roughly a third of an untrimmed multi-file publish. In a deployment model where the image is rebuilt and repushed on every merge to main, smaller images mean faster pushes and faster container starts.</p>
<p><strong>Non-root user.</strong> Running as root inside a container is a security risk that is trivially avoidable. The <code>adduser</code> step creates a dedicated system user; the <code>COPY --chown</code> ensures the published files are owned by that user. ACI does not require root for any of the operations the BFF performs.</p>
<p><code>HEALTHCHECK</code> <strong>directive.</strong> The health check command uses <code>wget</code> — available in Alpine — rather than <code>curl</code>, which is not included by default. The <code>/health/live</code> endpoint was defined in Article 4; it returns 200 if the process is responsive, without checking upstream dependencies. ACI monitors this endpoint to determine whether the container should receive traffic.</p>
<hr />
<h2>The CI pipeline</h2>
<p>The production system used GitHub Actions. The pipeline has three jobs: build-and-test, docker-build-and-push, and deploy-to-aci. The jobs run sequentially — deployment only proceeds if the tests pass and the image is published successfully.</p>
<pre><code class="language-yaml"># .github/workflows/deploy.yml
name: Build and Deploy BFF

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: educationplatformbff.azurecr.io
  IMAGE_NAME: bff
  RESOURCE_GROUP: rg-education-platform-prod
  CONTAINER_GROUP: cg-bff-prod
  CONTAINER_NAME: bff

jobs:
  # ── Job 1: Build and test ───────────────────────────────────────────────────
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup .NET 8
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '8.0.x'

      - name: Restore dependencies
        run: dotnet restore

      - name: Build
        run: dotnet build --no-restore --configuration Release

      - name: Run tests
        run: dotnet test --no-build --configuration Release --verbosity normal

      # Generate OpenAPI spec and validate Vue types in CI
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Vue dependencies
        working-directory: ./frontend
        run: npm ci

      - name: Start BFF for type generation
        run: |
          dotnet run --project EducationPlatform.Bff \
            --configuration Release &amp;
          sleep 8  # Wait for startup

      - name: Generate API types
        working-directory: ./frontend
        run: npm run generate:api:ci
        env:
          BFF_SWAGGER_URL: http://localhost:8080/swagger/v1/swagger.json

      - name: TypeScript type check
        working-directory: ./frontend
        run: npx tsc --noEmit

  # ── Job 2: Build and push Docker image ─────────────────────────────────────
  docker-build-and-push:
    runs-on: ubuntu-latest
    needs: build-and-test
    if: github.ref == 'refs/heads/main'  # Push only on main, not PRs
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build-push.outputs.digest }}

    steps:
      - uses: actions/checkout@v4

      - name: Login to Azure Container Registry
        uses: azure/docker-login@v1
        with:
          login-server: ${{ env.REGISTRY }}
          username: ${{ secrets.ACR_USERNAME }}
          password: ${{ secrets.ACR_PASSWORD }}

      - name: Extract metadata for Docker
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: \({{ env.REGISTRY }}/\){{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=,format=short
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push Docker image
        id: build-push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ── Job 3: Deploy to Azure Container Instances ──────────────────────────────
  deploy-to-aci:
    runs-on: ubuntu-latest
    needs: docker-build-and-push
    if: github.ref == 'refs/heads/main'
    environment: production

    steps:
      - name: Login to Azure
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Deploy to Azure Container Instances
        uses: azure/cli@v1
        with:
          azcliversion: latest
          inlineScript: |
            az container create \
              --resource-group ${{ env.RESOURCE_GROUP }} \
              --name ${{ env.CONTAINER_GROUP }} \
              --image \({{ env.REGISTRY }}/\){{ env.IMAGE_NAME }}:${{ github.sha }} \
              --registry-login-server ${{ env.REGISTRY }} \
              --registry-username ${{ secrets.ACR_USERNAME }} \
              --registry-password ${{ secrets.ACR_PASSWORD }} \
              --cpu 1 \
              --memory 1.5 \
              --ports 8080 \
              --protocol TCP \
              --restart-policy Always \
              --environment-variables \
                ASPNETCORE_ENVIRONMENT=Production \
                Services__UserService__BaseUrl=${{ secrets.USER_SERVICE_URL }} \
                Services__CourseService__BaseUrl=${{ secrets.COURSE_SERVICE_URL }} \
                Services__SessionService__BaseUrl=${{ secrets.SESSION_SERVICE_URL }} \
                Services__NotificationService__BaseUrl=${{ secrets.NOTIFICATION_SERVICE_URL }} \
                ApplicationInsights__ConnectionString=${{ secrets.APPINSIGHTS_CONNECTION_STRING }} \
              --secure-environment-variables \
                Feide__ClientId=${{ secrets.FEIDE_CLIENT_ID }} \
                Feide__ClientSecret=${{ secrets.FEIDE_CLIENT_SECRET }} \
                DataProtection__Key=${{ secrets.DATA_PROTECTION_KEY }} \
              --health-probe-http-path /health/live \
              --health-probe-port 8080 \
              --health-probe-interval-in-seconds 30
</code></pre>
<p>Three points on the pipeline design:</p>
<p><strong>The image tag uses the Git commit SHA, not</strong> <code>latest</code><strong>.</strong> The <code>latest</code> tag is updated in the registry, but the ACI deployment command references the SHA-tagged image explicitly. This means the deployed image and the CI build that produced it are always traceable to a specific commit. A <code>latest</code> deployment is unauditable — you cannot tell from the running container which commit it came from.</p>
<p><code>--secure-environment-variables</code> <strong>for secrets.</strong> Azure CLI's <code>az container create</code> accepts two environment variable flags. <code>--environment-variables</code> sets variables that appear in the container's environment and are visible in the Azure portal. <code>--secure-environment-variables</code> sets variables that are injected securely and are not visible after deployment — they do not appear in portal logs or CLI output. All credentials use the secure flag. Service base URLs, which are not secrets, use the standard flag — they are useful to inspect from the portal when debugging connectivity issues.</p>
<p><code>environment: production</code> <strong>on the deploy job.</strong> GitHub Environments add a required review gate before the deployment runs. In the production system, merges to main triggered an automatic build and test, but the actual ACI deployment required a manual approval from a second engineer. This is a lightweight but effective change control mechanism.</p>
<hr />
<h2>Data protection: encrypting the session cookie</h2>
<p>The BFF uses ASP.NET Core's Data Protection API to encrypt the session cookie that holds the Feide tokens. In a single-instance deployment this works out of the box — the key ring is generated on startup and lives in memory. In a deployment with container restarts or multiple instances, the key ring must be persisted externally, or users are signed out every time the container restarts.</p>
<p>The production system persisted the key ring to Azure Blob Storage:</p>
<pre><code class="language-bash">dotnet add package Microsoft.AspNetCore.DataProtection.AzureStorage
dotnet add package Azure.Storage.Blobs
</code></pre>
<pre><code class="language-csharp">// Program.cs — Data Protection configuration
var blobServiceClient = new BlobServiceClient(
    builder.Configuration["DataProtection:StorageConnectionString"]);

var containerClient = blobServiceClient.GetBlobContainerClient("data-protection");
await containerClient.CreateIfNotExistsAsync();

builder.Services
    .AddDataProtection()
    .PersistKeysToAzureBlobStorage(containerClient, "bff-keys.xml")
    .SetApplicationName("education-platform-bff")
    .SetDefaultKeyLifetime(TimeSpan.FromDays(90));
</code></pre>
<p>The <code>SetApplicationName</code> call is important. Data Protection uses the application name as part of the key derivation. If you deploy two versions of the BFF simultaneously — during a rolling update — they must share the same application name to be able to decrypt each other's cookies. Omitting this caused sign-out loops during the first rolling update in the production system.</p>
<hr />
<h2>Azure Container Registry: image retention policy</h2>
<p>The pipeline pushes a new image on every merge to main. Without a retention policy, the registry accumulates images indefinitely. The production system used a lifecycle policy to retain the last 10 images and delete older untagged manifests:</p>
<pre><code class="language-bash"># Set retention policy — keep 10 most recent images, purge after 30 days
az acr config retention update \
  --registry educationplatformbff \
  --status enabled \
  --days 30 \
  --type UntaggedManifests

# One-time cleanup of untagged images older than 1 day
az acr run \
  --registry educationplatformbff \
  --cmd "acr purge --filter 'bff:.*' --untagged --ago 30d" \
  /dev/null
</code></pre>
<p>This is housekeeping, but it matters at the registry billing level. Azure Container Registry charges for storage by GB, and a registry that accumulates 200 untagged image layers over six months costs meaningfully more than one that retains 10.</p>
<hr />
<h2>Azure Container Instances: the infrastructure definition</h2>
<p>The <code>az container create</code> command in the pipeline creates or updates the container group. For infrastructure that changes infrequently — CPU allocation, memory, port mapping — an ARM template or Bicep definition is more auditable than a long CLI command. The production system used a Bicep definition for the baseline infrastructure, with the CI pipeline overriding only the image tag on each deployment:</p>
<pre><code class="language-plaintext">// infra/bff-container.bicep
param location string = resourceGroup().location
param imageTag string
param acrLoginServer string
param acrUsername string
@secure()
param acrPassword string
@secure()
param feideClientId string
@secure()
param feideClientSecret string
@secure()
param dataProtectionConnectionString string
param appInsightsConnectionString string
param userServiceUrl string
param courseServiceUrl string
param sessionServiceUrl string
param notificationServiceUrl string

resource containerGroup 'Microsoft.ContainerInstance/containerGroups@2023-05-01' = {
  name: 'cg-bff-prod'
  location: location
  properties: {
    osType: 'Linux'
    restartPolicy: 'Always'

    imageRegistryCredentials: [
      {
        server: acrLoginServer
        username: acrUsername
        password: acrPassword
      }
    ]

    containers: [
      {
        name: 'bff'
        properties: {
          image: '\({acrLoginServer}/bff:\){imageTag}'
          ports: [{ port: 8080, protocol: 'TCP' }]

          resources: {
            requests: { cpu: 1, memoryInGB: 1 }
            limits:   { cpu: 1, memoryInGB: 1 }  // Hard limits — predictable billing
          }

          environmentVariables: [
            { name: 'ASPNETCORE_ENVIRONMENT',              value: 'Production' }
            { name: 'ASPNETCORE_URLS',                     value: 'http://+:8080' }
            { name: 'Services__UserService__BaseUrl',       value: userServiceUrl }
            { name: 'Services__CourseService__BaseUrl',     value: courseServiceUrl }
            { name: 'Services__SessionService__BaseUrl',    value: sessionServiceUrl }
            { name: 'Services__NotificationService__BaseUrl', value: notificationServiceUrl }
            { name: 'ApplicationInsights__ConnectionString', value: appInsightsConnectionString }
            { name: 'Feide__ClientId',      secureValue: feideClientId }
            { name: 'Feide__ClientSecret',  secureValue: feideClientSecret }
            { name: 'DataProtection__StorageConnectionString',
                      secureValue: dataProtectionConnectionString }
          ]

          livenessProbe: {
            httpGet: { path: '/health/live', port: 8080, scheme: 'HTTP' }
            initialDelaySeconds: 10
            periodSeconds: 30
            failureThreshold: 3
          }

          readinessProbe: {
            httpGet: { path: '/health/ready', port: 8080, scheme: 'HTTP' }
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          }
        }
      }
    ]

    ipAddress: {
      type: 'Public'
      ports: [{ port: 8080, protocol: 'TCP' }]
      dnsNameLabel: 'education-platform-bff'
    }
  }
}

output containerFqdn string = containerGroup.properties.ipAddress.fqdn
</code></pre>
<p>The Bicep definition is committed to the repository. The sensitive parameters are passed at deploy time from GitHub Secrets. This gives you infrastructure-as-code for everything that does not change per deployment, and runtime injection for everything that does.</p>
<hr />
<h2>Azure Front Door: routing and TLS termination</h2>
<p>ACI containers have public IP addresses but no TLS. Azure Front Door sits in front, terminates TLS, and provides a single stable hostname for both the Vue application (static files from Azure Storage or a CDN origin) and the BFF (the ACI container).</p>
<p>The routing rules:</p>
<ul>
<li><p><code>https://platform.example.no/api/*</code> → BFF container (<code>education-platform-bff.{region}.azurecontainer.io:8080</code>)</p>
</li>
<li><p><code>https://platform.example.no/*</code> → Vue application static files (Azure Blob Storage static website)</p>
</li>
</ul>
<pre><code class="language-plaintext">// infra/front-door.bicep (abbreviated)
resource frontDoorProfile 'Microsoft.Cdn/profiles@2023-05-01' = {
  name: 'afd-education-platform'
  location: 'global'
  sku: { name: 'Standard_AzureFrontDoor' }
}

// BFF origin group
resource bffOriginGroup 'Microsoft.Cdn/profiles/originGroups@2023-05-01' = {
  parent: frontDoorProfile
  name: 'og-bff'
  properties: {
    loadBalancingSettings: {
      sampleSize: 4
      successfulSamplesRequired: 3
      additionalLatencyInMilliseconds: 50
    }
    healthProbeSettings: {
      probePath: '/health/ready'
      probeRequestType: 'GET'
      probeProtocol: 'Http'
      probeIntervalInSeconds: 30
    }
  }
}

resource bffOrigin 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
  parent: bffOriginGroup
  name: 'bff-aci'
  properties: {
    hostName: bffContainerFqdn  // Output from bff-container.bicep
    httpPort: 8080
    originHostHeader: bffContainerFqdn
    priority: 1
    weight: 1000
    enabledState: 'Enabled'
  }
}

// Route: /api/* → BFF
resource bffRoute 'Microsoft.Cdn/profiles/afdEndpoints/routes@2023-05-01' = {
  name: 'route-bff'
  properties: {
    originGroup: { id: bffOriginGroup.id }
    patternsToMatch: ['/api/*']
    forwardingProtocol: 'HttpOnly'  // Front Door → ACI is internal, HTTP is fine
    httpsRedirect: 'Enabled'
    linkToDefaultDomain: 'Enabled'
  }
}
</code></pre>
<p>Front Door adds two things the ACI container cannot provide on its own: a trusted TLS certificate on a custom domain, and global edge caching for the Vue application's static assets. The BFF responses are not cached at the Front Door layer — they are authenticated and user-specific — but the Vue JS/CSS bundles benefit significantly from edge caching across Azure's PoPs.</p>
<p><strong>One Front Door note from production:</strong> the <code>forwardingProtocol: 'HttpOnly'</code> between Front Door and ACI is intentional. The BFF container listens on HTTP on port 8080. The TLS boundary is at Front Door — the traffic between Front Door and ACI traverses Azure's internal network, which does not leave Microsoft's infrastructure. Establishing a second TLS hop to ACI adds latency and complexity without adding meaningful security. This is a deliberate, documented trade-off, not a configuration oversight.</p>
<hr />
<h2>Azure API Management: when it adds value and when it does not</h2>
<p>API Management sits between Front Door and the BFF in the architecture described in Article 3. Whether to include it in the production deployment is a question the architecture series has deferred until here — because the answer depends on what you are actually deploying, and the honest answer is nuanced.</p>
<h3>When APIM is worth the overhead</h3>
<p><strong>Centralised JWT validation before the BFF.</strong> APIM can validate the Feide JWT at the network perimeter — before the request reaches the BFF — using an inbound policy. This offloads cryptographic validation from the BFF and means an invalid token never consumes BFF resources:</p>
<pre><code class="language-xml">&lt;!-- APIM inbound policy --&gt;
&lt;inbound&gt;
  &lt;validate-jwt header-name="Authorization" failed-validation-httpcode="401"&gt;
    &lt;openid-config url="https://auth.dataporten.no/.well-known/openid-configuration" /&gt;
    &lt;audiences&gt;
      &lt;audience&gt;your-client-id&lt;/audience&gt;
    &lt;/audiences&gt;
  &lt;/validate-jwt&gt;
  &lt;base /&gt;
&lt;/inbound&gt;
</code></pre>
<p><strong>Rate limiting per client or per user.</strong> APIM's rate limit policies can throttle by subscription key, IP, or JWT claim. For a platform with institutional clients, rate limiting per organisation prevents one institution's usage patterns from affecting another's:</p>
<pre><code class="language-xml">&lt;rate-limit-by-key calls="1000" renewal-period="60"
  counter-key="@(context.Request.Headers.GetValueOrDefault("X-Org-Id", "anonymous"))" /&gt;
</code></pre>
<p><strong>Request logging with correlation across tenants.</strong> APIM emits structured request logs to Application Insights that include the subscription key, caller IP, response code, and latency — with no code changes to the BFF. For a multi-tenant education platform, this per-organisation visibility is valuable for capacity planning and SLA reporting.</p>
<h3>When APIM adds cost without benefit</h3>
<p><strong>Single-tenant, single-client deployments.</strong> If the BFF serves one Vue application for one organisation, APIM's multi-tenancy features are unused overhead. The Standard tier costs roughly €130/month. For a deployment that would not use routing policies, subscription management, or the developer portal, that is €130/month for request proxying that Front Door already provides.</p>
<p><strong>When the BFF handles auth itself.</strong> The production system this series describes authenticates via Feide's OIDC flow — a server-side redirect flow, not a bearer token in the request header. APIM's JWT validation policy is not applicable. The auth boundary is the cookie session managed by the BFF, not a token at the network perimeter. In this specific configuration, APIM's primary value proposition does not apply.</p>
<p><strong>The honest answer for this production system:</strong> APIM was not included in the production deployment. The authentication model (cookie-based session, not JWT in header), the single-tenant deployment, and the cost overhead put it outside the value threshold. Front Door provided TLS termination, routing, and basic DDoS protection. The BFF provided everything else. APIM would be the first thing added if the platform expanded to serve multiple institutions as independent tenants with per-tenant rate limiting requirements.</p>
<p>This is the decision the architecture series promised in Article 3: here is the specific context, here is the reasoning, here is the call.</p>
<hr />
<h2>Environment promotion: staging before production</h2>
<p>The production pipeline had two environments: staging and production. The staging environment used the same ACI / Front Door topology with different resource names and configuration values. Every push to <code>main</code> deployed to staging automatically. Promotion to production required a manual approval gate in GitHub Environments.</p>
<p>The staging ACI container pointed to staging instances of the upstream services. The Feide integration used Feide's test environment (<code>https://auth.dataporten-test.no</code>), which allows test institution credentials without affecting production identity records.</p>
<p>The environment variable difference between staging and production was entirely in GitHub Secrets — the Bicep definition was identical. This is the correct model: infrastructure code is environment-agnostic; environment-specific values are injected at deployment time.</p>
<hr />
<h2>Rollback</h2>
<p>ACI's deployment model creates or replaces a container group. There is no built-in rollback command. The production rollback procedure was:</p>
<pre><code class="language-shell"># Redeploy the last known-good image tag (stored as a GitHub Actions output)
az container create \
  --resource-group rg-education-platform-prod \
  --name cg-bff-prod \
  --image educationplatformbff.azurecr.io/bff:${LAST_GOOD_SHA} \
  # ... remaining flags identical to the original deployment
</code></pre>
<p>The last-good SHA was recorded as a GitHub Actions environment variable after each successful deployment. This is manual, but it is fast — a rollback to the previous image completes in under two minutes, which is the ACI container start time plus the registry pull time.</p>
<p>For teams that need zero-downtime rollbacks, the correct tool is Azure Container Apps or AKS rather than ACI. ACI's container group replacement causes a brief interruption — typically 30 to 60 seconds — while the new container starts and the health probe validates it. For the production education platform, deployments were scheduled during low-traffic windows (evenings, weekends) and the brief interruption was acceptable.</p>
<hr />
<h2>Observability wiring: Application Insights in the container</h2>
<p>The Serilog Application Insights sink configured in Article 4 requires one environment variable to function: the Application Insights connection string. This was injected as a plain (non-secure) environment variable in the ACI deployment — connection strings are not credentials in the traditional sense, but they do identify your Application Insights resource. The production team treated them as non-secret but non-public.</p>
<p>Verify the telemetry pipeline is working after the first deployment:</p>
<pre><code class="language-shell"># Query Application Insights for BFF requests in the last 5 minutes
az monitor app-insights query \
  --app ai-education-platform \
  --analytics-query "requests | where timestamp &gt; ago(5m) | project timestamp, name, resultCode, duration | order by timestamp desc | limit 20"
</code></pre>
<p>If the BFF is running and the connection string is correct, this query returns the last 20 requests with their status codes and durations within seconds of them completing. No requests appearing means either the container is not running, the connection string is wrong, or the Serilog sink is not configured. Article 9 covers the full observability setup in depth.</p>
<hr />
<h2>The complete deployment, end to end</h2>
<p>A merge to <code>main</code> triggers this sequence:</p>
<ol>
<li><p>GitHub Actions: run .NET tests, generate OpenAPI types, TypeScript type check.</p>
</li>
<li><p>If all pass: build Docker image tagged with the commit SHA, push to ACR.</p>
</li>
<li><p>Manual approval gate (GitHub Environments) — second engineer reviews.</p>
</li>
<li><p>Deploy: <code>az container create</code> replaces the existing ACI container group with the new image.</p>
</li>
<li><p>ACI pulls the image from ACR, starts the container, waits for <code>/health/live</code> to return 200.</p>
</li>
<li><p>Azure Front Door health probe (<code>/health/ready</code>) validates upstream connectivity.</p>
</li>
<li><p>Once both probes pass, the container receives traffic.</p>
</li>
<li><p>Application Insights begins receiving telemetry within 30 seconds of startup.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/80c592bc-b015-4585-8a7e-aa687eec4ee6.png" alt="" style="display:block;margin:0 auto" />

<p>Total time from merge to production traffic: approximately 8 to 12 minutes, including the approval gate. The approval gate accounts for roughly 2 of those minutes on average — the remainder is build, push, and container start time.</p>
<hr />
<h2>What comes next</h2>
<p>The BFF is deployed, authenticated, and observable. The final two articles in the core series address the engineering discipline that keeps it that way: Article 8 covers testing strategy — unit, integration, and consumer-driven contract tests with Pact — and Article 9 covers the full observability setup with structured logging, distributed tracing, and Application Insights dashboards.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li>→ <a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Auth at the Boundary: Integrating Feide Identity via the BFF]]></title><description><![CDATA[A note on the code in this article. The implementation shown here is derived from a production authentication integration built for a Norwegian enterprise education platform using Feide as the identit]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/auth-at-the-boundary-integrating-feide-identity-via-the-bff</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/auth-at-the-boundary-integrating-feide-identity-via-the-bff</guid><category><![CDATA[OIDC]]></category><category><![CDATA[OAuth2]]></category><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[Token Handler]]></category><category><![CDATA[session management]]></category><category><![CDATA[authentication]]></category><category><![CDATA[identity-provider]]></category><category><![CDATA[SSO]]></category><category><![CDATA[Security]]></category><category><![CDATA[Web Architectures, ]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 12 Apr 2026 06:34:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/1bac752e-9a9e-45ce-8917-fb73521c21b4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>A note on the code in this article.</strong> The implementation shown here is derived from a production authentication integration built for a Norwegian enterprise education platform using Feide as the identity provider. Tenant identifiers, internal endpoint paths, and certain configuration details have been generalised to meet NDA obligations. The OAuth 2.0 / OIDC flow, Token Handler pattern implementation, session management strategy, and the specific failure modes each decision addresses are drawn directly from what was deployed and operated in production.</p>
</blockquote>
<hr />
<p>Authentication is the decision in a BFF architecture where getting it wrong is most expensive to undo. A poorly designed aggregation layer can be refactored incrementally. A poorly designed authentication boundary — tokens in the browser, session management scattered across client and server, a leaky security perimeter — creates vulnerabilities that propagate into every part of the system and are painful to retrofit.</p>
<p>This article covers the authentication architecture for the education platform at the centre of this series. The identity provider is Feide — the Norwegian government-issued federated identity system used by educational institutions across Norway, from primary schools to universities. Feide is OIDC-compliant and follows standard OAuth 2.0 flows, but its institutional context introduces specific requirements around organisation-scoped claims and role hierarchies that shape implementation decisions.</p>
<p>The core argument of this article: the BFF is the right place to own authentication, tokens should never reach the browser, and cookie-based sessions managed server-side are more secure and simpler to reason about than browser-held tokens — despite what the proliferation of JWT-in-localStorage tutorials might suggest.</p>
<hr />
<h2>Why tokens in the browser are the wrong model</h2>
<p>Before the implementation, the security argument deserves to be made explicitly, because the alternative — storing access tokens in <code>localStorage</code> or <code>sessionStorage</code> and attaching them to requests from the Vue application — is common, documented in many identity provider tutorials, and genuinely wrong for this class of application.</p>
<p>The problem is the browser's threat model. <code>localStorage</code> is accessible to any JavaScript running on the page. In a complex web application with third-party dependencies — analytics scripts, support widgets, UI libraries — the attack surface for cross-site scripting is real. A single XSS vulnerability in any dependency gives an attacker access to every token in storage. An access token for Feide, which carries institutional identity and role claims, is a meaningful target in an education context.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/e3969979-5c68-4f3d-bdb9-3cd77d10484b.png" alt="" style="display:block;margin:0 auto" />

<p><code>HttpOnly</code> cookies are not accessible to JavaScript at all. An XSS attack that compromises a page's JavaScript cannot read an <code>HttpOnly</code> cookie. The cookie is attached to requests by the browser's network stack, not by application code. This does not eliminate all attack vectors — CSRF remains a concern — but it removes the entire class of token theft via script injection, which is the higher-probability attack.</p>
<p>The Token Handler pattern implements this correctly: the BFF holds the access token server-side, issues a session cookie to the browser, and exchanges the cookie for the token on every upstream API call. The browser never sees the token. The Vue application never handles authentication directly. This is not additional complexity — it is relocating complexity from the browser (where it cannot be properly secured) to the server (where it can).</p>
<hr />
<h2>Feide: what it is and what it provides</h2>
<p>Feide (Felles Elektronisk IDentitet — common electronic identity) is the identity federation service operated by Sikt for Norwegian educational institutions. It provides federated single sign-on across universities, university colleges, and primary and secondary schools. An institution's staff and students authenticate with their institutional credentials; Feide issues identity tokens that carry organisation membership, role information, and a persistent identifier that is stable across sessions.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/9ab3640f-85ad-421d-bf34-3df0aacad90e.png" alt="" style="display:block;margin:0 auto" />

<p>For an education platform, this means:</p>
<ul>
<li><p>Users authenticate once with their institution's credentials — no separate account registration</p>
</li>
<li><p>The platform receives verified organisational membership and role claims (<code>eduPersonAffiliation</code>, <code>eduPersonPrimaryAffiliation</code>, <code>orgMembership</code>)</p>
</li>
<li><p>A stable, pseudonymous identifier (<code>feidePersonPrincipalName</code>, typically <a href="mailto:username@institution.no"><code>username@institution.no</code></a>) is available for user records</p>
</li>
<li><p>The platform can scope data access by institution without maintaining its own organisation identity store</p>
</li>
</ul>
<p>Feide exposes a standard OIDC provider at <a href="https://auth.dataporten.no"><code>https://auth.dataporten.no</code></a>. The integration uses the Authorization Code flow with PKCE, which is the correct flow for server-side applications that can keep a client secret. The production system used Feide's Dataporten platform, which wraps Feide identity with additional APIs for group membership and course data — though those APIs are not covered here as they are specific to the platform's data architecture.</p>
<hr />
<h2>The authentication flow, end to end</h2>
<p>The full flow involves four parties: the Vue application (browser), the BFF (.NET Core), Feide (the identity provider), and the upstream services.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/011549c5-d876-45da-a314-942eb1756528.png" alt="" style="display:block;margin:0 auto" />

<p>The browser participates in the OIDC flow via redirects only — it is never given a token. The BFF holds all three tokens (access, ID, refresh) in an encrypted server-side session. The Vue application interacts with the BFF exclusively via its session cookie.</p>
<hr />
<h2>Setting up OIDC in .NET Core</h2>
<p>Install the required packages:</p>
<pre><code class="language-shell">dotnet add package Microsoft.AspNetCore.Authentication.OpenIdConnect
dotnet add package Microsoft.AspNetCore.Authentication.Cookies
</code></pre>
<p>The authentication configuration in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">// Program.cs

var feideConfig = builder.Configuration.GetSection("Feide");

builder.Services
    .AddAuthentication(options =&gt;
    {
        options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme;
        options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme;
    })
    .AddCookie(CookieAuthenticationDefaults.AuthenticationScheme, options =&gt;
    {
        options.Cookie.Name = "__bff_session";
        options.Cookie.HttpOnly = true;
        options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
        options.Cookie.SameSite = SameSiteMode.Strict;
        options.Cookie.MaxAge = TimeSpan.FromHours(8); // Align with Feide session lifetime
        options.SlidingExpiration = true;
        options.ExpireTimeSpan = TimeSpan.FromHours(8);

        // Redirect API requests to 401 instead of the login page
        options.Events.OnRedirectToLogin = ctx =&gt;
        {
            if (ctx.Request.Path.StartsWithSegments("/api"))
            {
                ctx.Response.StatusCode = StatusCodes.Status401Unauthorized;
                return Task.CompletedTask;
            }
            ctx.Response.Redirect(ctx.RedirectUri);
            return Task.CompletedTask;
        };
    })
    .AddOpenIdConnect(OpenIdConnectDefaults.AuthenticationScheme, options =&gt;
    {
        options.Authority = feideConfig["Authority"]; // https://auth.dataporten.no
        options.ClientId = feideConfig["ClientId"];
        options.ClientSecret = feideConfig["ClientSecret"];

        options.ResponseType = OpenIdConnectResponseType.Code; // Auth Code flow
        options.UsePkce = true;

        options.Scope.Clear();
        options.Scope.Add("openid");
        options.Scope.Add("profile");
        options.Scope.Add("email");
        options.Scope.Add("groups");        // Feide group membership
        options.Scope.Add("userinfo-feide"); // Feide-specific claims

        options.CallbackPath = "/auth/callback";
        options.SignedOutCallbackPath = "/auth/signout-callback";

        options.SaveTokens = true; // Stores tokens in the session — critical for Token Handler

        options.GetClaimsFromUserInfoEndpoint = true;

        // Map Feide-specific claims to standard .NET claim types
        options.ClaimActions.MapJsonKey(ClaimTypes.Email, "email");
        options.ClaimActions.MapJsonKey("feide:orgid", "eduPersonOrgDN");
        options.ClaimActions.MapJsonKey("feide:role", "eduPersonPrimaryAffiliation");
        options.ClaimActions.MapJsonKey("feide:principal", "feidePersonPrincipalName");

        options.TokenValidationParameters = new TokenValidationParameters
        {
            NameClaimType = "feidePersonPrincipalName",
            RoleClaimType = "eduPersonAffiliation"
        };

        options.Events = new OpenIdConnectEvents
        {
            OnTokenValidated = ctx =&gt;
            {
                // Add the Feide principal name as the NameIdentifier claim
                // so ctx.User.FindFirstValue(ClaimTypes.NameIdentifier) works
                // consistently throughout the BFF
                var principal = ctx.Principal!;
                var feidePrincipal = principal.FindFirstValue("feidePersonPrincipalName");
                if (feidePrincipal is not null)
                {
                    var identity = (ClaimsIdentity)principal.Identity!;
                    identity.AddClaim(new Claim(ClaimTypes.NameIdentifier, feidePrincipalName));
                }
                return Task.CompletedTask;
            },

            OnAuthenticationFailed = ctx =&gt;
            {
                var logger = ctx.HttpContext.RequestServices
                    .GetRequiredService&lt;ILogger&lt;Program&gt;&gt;();
                logger.LogError(ctx.Exception,
                    "Feide authentication failed. CorrelationId: {CorrelationId}",
                    ctx.HttpContext.TraceIdentifier);
                ctx.Response.Redirect("/auth/error");
                ctx.HandleResponse();
                return Task.CompletedTask;
            }
        };
    });

builder.Services.AddAuthorization();
</code></pre>
<p>Three decisions in this configuration deserve attention.</p>
<p><code>SaveTokens = true</code> <strong>is the Token Handler pivot.</strong> This instructs the OIDC middleware to persist the access token, ID token, and refresh token in the encrypted cookie session. The BFF retrieves the access token from the session on every upstream API call. Without this, the token exchange would need to be implemented manually.</p>
<p><strong>The</strong> <code>OnRedirectToLogin</code> <strong>event handler separates browser and API requests.</strong> Without this, a request to <code>/api/dashboard</code> with an expired session returns a <code>302</code> redirect to the Feide login page — which the <code>fetch</code> call in the Vue composable receives as a 200 with HTML content and breaks silently. The handler returns a clean <code>401</code> for API paths, which the <code>useApi</code> composable handles explicitly.</p>
<p><strong>Claim mapping from Feide-specific types to standard .NET claim types.</strong> Feide's userinfo endpoint returns claims with LDAP-style keys (<code>eduPersonPrimaryAffiliation</code>, <code>feidePersonPrincipalName</code>). The <code>ClaimActions.MapJsonKey</code> calls map these to keys the BFF's claim-reading code can use consistently. The <code>OnTokenValidated</code> event ensures <code>ClaimTypes.NameIdentifier</code> is set from the Feide principal name — so the user ID extraction in every aggregator (<code>ctx.User.FindFirstValue(ClaimTypes.NameIdentifier)</code>) works without Feide-specific knowledge.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/e32734d0-6f71-4846-98f7-34cb4b68042e.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>The auth endpoints</h2>
<p>Three endpoints handle the authentication lifecycle:</p>
<pre><code class="language-csharp">// Endpoints/AuthEndpoints.cs
public static class AuthEndpoints
{
    public static IEndpointRouteBuilder MapAuthEndpoints(
        this IEndpointRouteBuilder app)
    {
        app.MapGet("/auth/login", LoginAsync);
        app.MapGet("/auth/logout", LogoutAsync).RequireAuthorization();
        app.MapGet("/auth/me", GetCurrentUserAsync).RequireAuthorization();

        return app;
    }

    // Triggers the OIDC challenge — redirects to Feide
    private static IResult LoginAsync(HttpContext ctx)
    {
        var returnUrl = ctx.Request.Query["returnUrl"].FirstOrDefault() ?? "/";

        // Validate returnUrl to prevent open redirects
        if (!Uri.TryCreate(returnUrl, UriKind.Relative, out _))
            returnUrl = "/";

        return Results.Challenge(
            new AuthenticationProperties { RedirectUri = returnUrl },
            [OpenIdConnectDefaults.AuthenticationScheme]);
    }

    // Signs out locally and triggers Feide end-session endpoint
    private static async Task&lt;IResult&gt; LogoutAsync(HttpContext ctx)
    {
        await ctx.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme);
        return Results.SignOut(
            new AuthenticationProperties { RedirectUri = "/" },
            [OpenIdConnectDefaults.AuthenticationScheme]);
    }

    // Returns the current user's profile — consumed by the Vue session store
    private static IResult GetCurrentUserAsync(HttpContext ctx)
    {
        var user = ctx.User;

        var profile = new AuthenticatedUserResponse(
            PrincipalName: user.FindFirstValue("feidePersonPrincipalName")!,
            DisplayName: user.FindFirstValue(ClaimTypes.Name)
                ?? user.FindFirstValue("feidePersonPrincipalName")!,
            Email: user.FindFirstValue(ClaimTypes.Email),
            Role: TranslateFeideRole(user.FindFirstValue("feidePersonPrincipalName")!,
                  user.FindFirstValue("eduPersonPrimaryAffiliation")),
            OrgId: ExtractOrgId(user.FindFirstValue("eduPersonOrgDN"))
        );

        return Results.Ok(profile);
    }

    private static string TranslateFeideRole(string principal, string? affiliationCode) =&gt;
        affiliationCode switch
        {
            "staff"    =&gt; "Teacher",
            "student"  =&gt; "Student",
            "faculty"  =&gt; "Teacher",
            "employee" =&gt; "Staff",
            _          =&gt; "Unknown"
        };

    // eduPersonOrgDN is an LDAP DN: dc=uninett,dc=no → extract org identifier
    private static string? ExtractOrgId(string? orgDn)
    {
        if (orgDn is null) return null;
        var parts = orgDn.Split(',');
        return parts.FirstOrDefault(p =&gt; p.StartsWith("dc=", StringComparison.OrdinalIgnoreCase))
            ?.Split('=').ElementAtOrDefault(1);
    }
}
</code></pre>
<p>The <code>/auth/me</code> endpoint is what the Vue session store calls on application load to hydrate the user profile. It reads from the claims already in the session — no upstream call required. This is fast and safe: the session cookie validates the user's identity; the claims in the cookie provide the profile data.</p>
<hr />
<h2>The Token Handler: forwarding tokens to upstream services</h2>
<p>The most important implementation detail in this architecture is how the access token — held server-side in the session — is forwarded to upstream services on behalf of the authenticated user.</p>
<p>The Token Handler is a <code>DelegatingHandler</code> that intercepts every <code>HttpClient</code> call and attaches the access token from the current user's session:</p>
<pre><code class="language-csharp">// Infrastructure/FeideTokenHandler.cs
public sealed class FeideTokenHandler(IHttpContextAccessor contextAccessor)
    : DelegatingHandler
{
    protected override async Task&lt;HttpResponseMessage&gt; SendAsync(
        HttpRequestMessage request,
        CancellationToken ct)
    {
        var ctx = contextAccessor.HttpContext;

        if (ctx?.User.Identity?.IsAuthenticated == true)
        {
            // Retrieve the access token stored by SaveTokens = true
            var accessToken = await ctx.GetTokenAsync(
                CookieAuthenticationDefaults.AuthenticationScheme,
                "access_token");

            if (accessToken is not null)
            {
                request.Headers.Authorization =
                    new AuthenticationHeaderValue("Bearer", accessToken);
            }
            else
            {
                // Token missing from session — likely expired without refresh
                // Log and let the upstream return 401, which the BFF maps to a 503
                var logger = ctx.RequestServices
                    .GetRequiredService&lt;ILogger&lt;FeideTokenHandler&gt;&gt;();
                logger.LogWarning(
                    "Access token not available in session for user {User}. " +
                    "Request to {RequestUri} will proceed without authorization header.",
                    ctx.User.FindFirstValue(ClaimTypes.NameIdentifier),
                    request.RequestUri);
            }
        }

        return await base.SendAsync(request, ct);
    }
}
</code></pre>
<p>Register it as a transient service and attach it to every typed client:</p>
<pre><code class="language-csharp">// Program.cs
builder.Services.AddTransient&lt;FeideTokenHandler&gt;();

builder.Services.AddHttpClient&lt;CourseServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:CourseService:BaseUrl"]!))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;()
    .AddStandardResilienceHandler();

builder.Services.AddHttpClient&lt;SessionServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:SessionService:BaseUrl"]!))
    .AddHttpMessageHandler&lt;FeideTokenHandler&gt;()
    .AddStandardResilienceHandler();

// Repeat for every upstream client
</code></pre>
<p>The <code>FeideTokenHandler</code> is registered once and applied to every HTTP client that calls authenticated upstream services. The upstream services receive a standard <code>Bearer</code> token in the <code>Authorization</code> header — they never know or care that the token came from a server-side session rather than a browser-held JWT.</p>
<hr />
<h2>Token refresh: handling expiry transparently</h2>
<p>Feide access tokens have a limited lifetime. In the production system, tokens expired after one hour. A user with an active eight-hour session must have their token refreshed transparently without being redirected to Feide to re-authenticate.</p>
<p>The refresh is handled by an OIDC events hook that fires when the cookie session is validated:</p>
<pre><code class="language-csharp">// Infrastructure/TokenRefreshService.cs
public sealed class TokenRefreshService(IHttpClientFactory httpClientFactory)
{
    public async Task&lt;bool&gt; TryRefreshAsync(HttpContext ctx)
    {
        var refreshToken = await ctx.GetTokenAsync(
            CookieAuthenticationDefaults.AuthenticationScheme,
            "refresh_token");

        if (refreshToken is null) return false;

        var client = httpClientFactory.CreateClient();
        var feideAuthority = ctx.RequestServices
            .GetRequiredService&lt;IConfiguration&gt;()["Feide:Authority"];

        var tokenResponse = await client.PostAsync(
            $"{feideAuthority}/openid/token",
            new FormUrlEncodedContent(new Dictionary&lt;string, string&gt;
            {
                ["grant_type"]    = "refresh_token",
                ["refresh_token"] = refreshToken,
                ["client_id"]     = ctx.RequestServices
                    .GetRequiredService&lt;IConfiguration&gt;()["Feide:ClientId"]!,
                ["client_secret"] = ctx.RequestServices
                    .GetRequiredService&lt;IConfiguration&gt;()["Feide:ClientSecret"]!
            }));

        if (!tokenResponse.IsSuccessStatusCode)
        {
            var logger = ctx.RequestServices.GetRequiredService&lt;ILogger&lt;TokenRefreshService&gt;&gt;();
            logger.LogWarning(
                "Token refresh failed for user {User}. Status: {Status}. " +
                "User will need to re-authenticate.",
                ctx.User.FindFirstValue(ClaimTypes.NameIdentifier),
                tokenResponse.StatusCode);
            return false;
        }

        var tokens = await tokenResponse.Content
            .ReadFromJsonAsync&lt;TokenRefreshResponse&gt;();

        // Update the tokens stored in the session
        var authResult = await ctx.AuthenticateAsync(
            CookieAuthenticationDefaults.AuthenticationScheme);

        if (authResult?.Properties is null) return false;

        authResult.Properties.UpdateTokenValue("access_token", tokens!.AccessToken);
        authResult.Properties.UpdateTokenValue("expires_at",
            DateTimeOffset.UtcNow
                .AddSeconds(tokens.ExpiresIn)
                .ToString("o"));

        if (tokens.RefreshToken is not null)
            authResult.Properties.UpdateTokenValue("refresh_token", tokens.RefreshToken);

        await ctx.SignInAsync(
            CookieAuthenticationDefaults.AuthenticationScheme,
            authResult.Principal!,
            authResult.Properties);

        return true;
    }

    private sealed record TokenRefreshResponse(
        [property: JsonPropertyName("access_token")] string AccessToken,
        [property: JsonPropertyName("expires_in")] int ExpiresIn,
        [property: JsonPropertyName("refresh_token")] string? RefreshToken
    );
}
</code></pre>
<p>Wire the refresh into the <code>FeideTokenHandler</code>, so it fires automatically when the token is close to expiry:</p>
<pre><code class="language-csharp">// Infrastructure/FeideTokenHandler.cs — updated
public sealed class FeideTokenHandler(
    IHttpContextAccessor contextAccessor,
    TokenRefreshService tokenRefresh)
    : DelegatingHandler
{
    protected override async Task&lt;HttpResponseMessage&gt; SendAsync(
        HttpRequestMessage request,
        CancellationToken ct)
    {
        var ctx = contextAccessor.HttpContext;

        if (ctx?.User.Identity?.IsAuthenticated == true)
        {
            var expiresAt = await ctx.GetTokenAsync(
                CookieAuthenticationDefaults.AuthenticationScheme,
                "expires_at");

            // Refresh proactively if within 5 minutes of expiry
            if (DateTimeOffset.TryParse(expiresAt, out var expiry)
                &amp;&amp; expiry &lt; DateTimeOffset.UtcNow.AddMinutes(5))
            {
                await tokenRefresh.TryRefreshAsync(ctx);
            }

            var accessToken = await ctx.GetTokenAsync(
                CookieAuthenticationDefaults.AuthenticationScheme,
                "access_token");

            if (accessToken is not null)
                request.Headers.Authorization =
                    new AuthenticationHeaderValue("Bearer", accessToken);
        }

        return await base.SendAsync(request, ct);
    }
}
</code></pre>
<p>The five-minute proactive refresh window prevents the edge case where a token expires between the check and the upstream call completing. The refresh is synchronous within the handler, which adds latency to requests that trigger it — in practice, this happens at most once per hour per user, and the upstream call still completes successfully.</p>
<p>Register <code>TokenRefreshService</code>:</p>
<pre><code class="language-csharp">builder.Services.AddTransient&lt;TokenRefreshService&gt;();
</code></pre>
<hr />
<h2>CSRF protection</h2>
<p><code>SameSite=Strict</code> on the session cookie is the first line of CSRF defence — cross-site requests do not include the cookie at all. For the production system, where the Vue application and the BFF share an origin (same domain, served behind Azure Front Door), <code>SameSite=Strict</code> was sufficient.</p>
<p>For deployments where the Vue application and BFF are on different subdomains, <code>SameSite=Lax</code> is required, and explicit CSRF token validation is necessary. The BFF generates a CSRF token, sets it in a non-HttpOnly cookie (so the Vue application's JavaScript can read it), and the Vue <code>apiClient</code> attaches it as a request header:</p>
<pre><code class="language-csharp">// Middleware/CsrfMiddleware.cs
public sealed class CsrfMiddleware(RequestDelegate next, IAntiforgery antiforgery)
{
    public async Task InvokeAsync(HttpContext ctx)
    {
        if (ctx.Request.Path.StartsWithSegments("/api")
            &amp;&amp; !HttpMethods.IsGet(ctx.Request.Method)
            &amp;&amp; !HttpMethods.IsHead(ctx.Request.Method))
        {
            await antiforgery.ValidateRequestAsync(ctx);
        }

        // Set the CSRF token cookie on every response so Vue can read it
        var tokens = antiforgery.GetAndStoreTokens(ctx);
        ctx.Response.Cookies.Append("XSRF-TOKEN", tokens.RequestToken!, new CookieOptions
        {
            HttpOnly = false, // Must be readable by JavaScript
            Secure = true,
            SameSite = SameSiteMode.Lax
        });

        await next(ctx);
    }
}
</code></pre>
<p>In the Vue <code>apiClient</code>, the CSRF token is read from the cookie and attached to mutation requests:</p>
<pre><code class="language-typescript">// src/api/client.ts — CSRF-aware post
function getCsrfToken(): string | null {
  const match = document.cookie.match(/XSRF-TOKEN=([^;]+)/)
  return match ? decodeURIComponent(match[1]) : null
}

export const apiClient = {
  async post&lt;T&gt;(path: string, body: unknown, init?: RequestInit): Promise&lt;T&gt; {
    const csrfToken = getCsrfToken()
    const response = await fetch(`/api${path}`, {
      ...init,
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
        ...(csrfToken ? { 'X-XSRF-TOKEN': csrfToken } : {}),
        ...init?.headers
      },
      body: JSON.stringify(body),
      credentials: 'include'
    })
    return handleResponse&lt;T&gt;(response)
  }
}
</code></pre>
<p>In the production system, the shared-origin deployment made this unnecessary. It is included here because the pattern is needed the moment the deployment topology changes.</p>
<hr />
<h2>The Vue session store: connecting auth to the UI</h2>
<p>The Vue application needs to know whether the user is authenticated, and if so, who they are. The session store from Article 5 calls the <code>/auth/me</code> endpoint on application startup:</p>
<pre><code class="language-typescript">// src/stores/session.ts
import { defineStore } from 'pinia'
import { ref, computed } from 'vue'
import { apiClient, ApiResponseError } from '@/api/client'
import type { AuthenticatedUserResponse } from '@/api/types'

export const useSessionStore = defineStore('session', () =&gt; {
  const profile = ref&lt;AuthenticatedUserResponse | null&gt;(null)
  const isAuthenticated = computed(() =&gt; profile.value !== null)
  const isLoading = ref(false)

  async function initialise() {
    isLoading.value = true
    try {
      profile.value = await apiClient.get&lt;AuthenticatedUserResponse&gt;('/auth/me')
    } catch (e) {
      if (e instanceof ApiResponseError &amp;&amp; e.status === 401) {
        // Not authenticated — expected on first visit
        profile.value = null
      } else {
        // Unexpected error — log but do not block the application
        console.error('Session initialisation failed:', e)
        profile.value = null
      }
    } finally {
      isLoading.value = false
    }
  }

  function redirectToLogin(returnUrl = window.location.pathname) {
    window.location.href = `/auth/login?returnUrl=${encodeURIComponent(returnUrl)}`
  }

  function clearSession() {
    profile.value = null
    window.location.href = '/auth/logout'
  }

  return {
    profile,
    isAuthenticated,
    isLoading,
    initialise,
    redirectToLogin,
    clearSession
  }
})
</code></pre>
<p>Initialise the store in <code>App.vue</code> before rendering protected routes:</p>
<pre><code class="language-typescript">&lt;!-- src/App.vue --&gt;
&lt;script setup lang="ts"&gt;
import { onMounted } from 'vue'
import { useSessionStore } from '@/stores/session'

const session = useSessionStore()
onMounted(() =&gt; session.initialise())
&lt;/script&gt;
</code></pre>
<p>And protect routes with a navigation guard:</p>
<pre><code class="language-typescript">// src/router/index.ts
import { useSessionStore } from '@/stores/session'

router.beforeEach(async (to) =&gt; {
  if (!to.meta.requiresAuth) return true

  const session = useSessionStore()

  // Wait for session to initialise on first navigation
  if (session.isLoading) {
    await new Promise&lt;void&gt;(resolve =&gt; {
      const stop = watch(session.isLoading, loading =&gt; {
        if (!loading) { stop(); resolve() }
      })
    })
  }

  if (!session.isAuthenticated) {
    session.redirectToLogin(to.fullPath)
    return false
  }

  return true
})
</code></pre>
<p>The navigation guard waits for the session initialisation to complete before evaluating authentication status. Without this wait, a page refresh on a protected route redirects to login before the <code>/auth/me</code> response has returned — even for authenticated users.</p>
<hr />
<h2>The appsettings configuration</h2>
<pre><code class="language-json">// appsettings.json
{
  "Feide": {
    "Authority": "https://auth.dataporten.no",
    "ClientId": "",
    "ClientSecret": ""
  }
}
</code></pre>
<p><code>ClientId</code> and <code>ClientSecret</code> are empty in <code>appsettings.json</code> and are never committed to source control. In Azure Container Instances, they are injected as environment variables:</p>
<pre><code class="language-plaintext">Feide__ClientId       → injected from Azure Key Vault reference
Feide__ClientSecret   → injected from Azure Key Vault reference
</code></pre>
<p>.NET's configuration system maps double-underscore environment variable names to nested JSON paths, so <code>Feide__ClientId</code> maps to <code>Feide.ClientId</code>. Article 7 covers the Key Vault reference configuration in the ACI deployment pipeline.</p>
<hr />
<h2>What the production system learned about Feide integration</h2>
<p>Several decisions were revised during the production deployment:</p>
<p><strong>The session cookie lifetime must align with Feide's session.</strong> An early implementation used a 24-hour cookie with a one-hour Feide access token and no refresh logic. The result: after one hour, the session cookie was valid but the access token was expired. Every upstream call returned 401. The fix — proactive token refresh in the <code>FeideTokenHandler</code> — was added in the second sprint after first deployment. The current eight-hour cookie lifetime and five-minute refresh window are the values that matched observed usage patterns.</p>
<p><code>GetClaimsFromUserInfoEndpoint = true</code> <strong>is required for Feide-specific claims.</strong> The standard ID token from Feide does not include organisation membership or affiliation claims — these come from the Feide userinfo endpoint. Without <code>GetClaimsFromUserInfoEndpoint = true</code>, the BFF receives only the standard OIDC claims and the role translation returns "Unknown" for every user. This was a non-obvious configuration gap that took half a day to diagnose in the staging environment.</p>
<p><strong>The</strong> <code>OnRedirectToLogin</code> <strong>event handler is not optional.</strong> Before it was added, the Vue application's composables received HTML login-page redirects as successful 200 responses. The <code>useDashboard</code> composable silently failed to parse HTML as JSON, <code>data.value</code> remained null, and the dashboard rendered in a permanently loading state. The fix was the API-path 401 handler in the cookie options — immediate and unambiguous failure is more debuggable than silent null data.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/1a0e44bf-7579-4447-98da-ce22b84668e5.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What comes next</h2>
<p>With authentication established at the BFF boundary, the next article addresses deployment: building the Docker image, publishing artifacts through the pipeline, and running the BFF on Azure Container Instances — including environment variable injection for secrets and the health probe configuration that keeps the container in rotation when it is healthy and out of it when it is not.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="#">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="#">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="#">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="#">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="#">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li>→ <a href="#">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="#">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="#">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="#">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="#">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="#">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="#">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[The Vue 3 API Layer of BFF: Composables, Error Boundaries & Type Safety]]></title><description><![CDATA[A note on the code in this article. The implementation shown here is derived from a production Vue 3 application built for a Norwegian enterprise education platform. Service names, domain models, and ]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety</guid><category><![CDATA[Vue3]]></category><category><![CDATA[TypeScript]]></category><category><![CDATA[composable]]></category><category><![CDATA[OpenApi]]></category><category><![CDATA[api-layer]]></category><category><![CDATA[bff]]></category><category><![CDATA[error handling]]></category><category><![CDATA[Error boundaries]]></category><category><![CDATA[FrontendArchitecture]]></category><category><![CDATA[best practices]]></category><category><![CDATA[production]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 12 Apr 2026 04:15:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/5001ee62-fe9b-40d2-9d6f-cd4d8dfe092f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>A note on the code in this article.</strong> The implementation shown here is derived from a production Vue 3 application built for a Norwegian enterprise education platform. Service names, domain models, and certain structural details have been generalised to meet NDA obligations. The composable patterns, type generation pipeline, error handling strategies, and the specific production problems each decision addresses are drawn directly from what was shipped and maintained in production.</p>
</blockquote>
<hr />
<p>Article 4 built the BFF service. This article builds the other side of the contract — the Vue 3 layer that consumes it. The BFF gives your frontend a stable, shaped API. What you do with that API on the client side determines whether your components stay clean or accumulate the same adapter logic the BFF was supposed to eliminate.</p>
<p>The goal is a client API layer where every component receives typed, ready-to-render data from a composable, error states are handled consistently and not improvised per-component, and a change to the BFF's response schema surfaces as a compile-time error in TypeScript before it reaches the browser. This is achievable — the production system this series is based on ran this setup in CI — but it requires deliberate structure from the start.</p>
<p>This article covers the full stack: OpenAPI type generation, the base <code>useApi</code> composable, screen-level composables built on top of it, error boundary strategy, and the patterns that did not survive contact with production alongside the ones that did.</p>
<hr />
<h2>The type generation pipeline</h2>
<p>The contract between the Vue application and the BFF should be enforced by the type system, not by convention or documentation. The mechanism is straightforward: the BFF publishes an OpenAPI spec, a generation script turns that spec into TypeScript interfaces, and those interfaces are imported wherever the API response is consumed.</p>
<p>Install the generator:</p>
<pre><code class="language-shell">npm install -D openapi-typescript
</code></pre>
<p>Add the generation script to <code>package.json</code>:</p>
<pre><code class="language-json">{
  "scripts": {
    "generate:api": "openapi-typescript http://localhost:5000/swagger/v1/swagger.json --output src/api/types.gen.ts",
    "generate:api:ci": "openapi-typescript $BFF_SWAGGER_URL --output src/api/types.gen.ts"
  }
}
</code></pre>
<p>The <code>generate:api</code> script targets the local BFF running in development. The <code>generate:api:ci</code> variant uses an environment variable pointing to the staging BFF — the same version that will be deployed alongside the frontend build. This matters: generating types from a different BFF version than the one you are deploying against defeats the purpose of the pipeline.</p>
<p>Running this against the BFF built in Article 4 produces a file like this (abbreviated):</p>
<pre><code class="language-typescript">// src/api/types.gen.ts — generated, do not edit manually
export interface components {
  schemas: {
    DashboardResponse: {
      user: components['schemas']['UserProfileResponse'];
      courses: components['schemas']['CourseResponse'][];
      upcomingSessions: components['schemas']['SessionResponse'][];
      notifications: components['schemas']['NotificationSummary'];
      partialFailures: string[];
    };
    UserProfileResponse: {
      displayName: string;
      role: string;
      avatarUrl: string | null;
    };
    CourseResponse: {
      id: string;
      title: string;
      code: string;
      enrollmentLabel: string;
      enrollmentPercent: number;
      status: string;
    };
    SessionResponse: {
      id: string;
      title: string;
      startsAt: string;
      courseTitle: string;
      locationLabel: string;
    };
    NotificationSummary: {
      count: number;
    };
  };
}
</code></pre>
<p>Create a single re-export file that the rest of the application imports from. Never import from <code>types.gen.ts</code> directly — doing so couples every consumer to the generated file's internal structure, which changes every time the generator runs:</p>
<pre><code class="language-typescript">// src/api/types.ts
import type { components } from './types.gen'

export type DashboardResponse = components['schemas']['DashboardResponse']
export type UserProfileResponse = components['schemas']['UserProfileResponse']
export type CourseResponse = components['schemas']['CourseResponse']
export type SessionResponse = components['schemas']['SessionResponse']
export type NotificationSummary = components['schemas']['NotificationSummary']
</code></pre>
<p>This indirection layer is the difference between a type generation pipeline and a type generation obligation. The generated file changes freely; the re-export file changes only when the public contract deliberately changes.</p>
<p><strong>Make type generation a CI gate.</strong> In the production system, the CI pipeline ran <code>generate:api:ci</code> and then <code>tsc --noEmit</code>. A BFF response shape change that broke the Vue application's types failed the build before deployment. This caught contract mismatches three times in the project's lifetime — each time before a broken build reached staging.</p>
<hr />
<h2>The base composable: <code>useApi</code></h2>
<p>Every API call in the application goes through a single base composable. This is the most important structural decision in the client API layer, and the one most frequently skipped in Vue projects that start simple and grow into a tangle of inconsistent error handling.</p>
<p>The base composable is responsible for three things and only three things: executing a fetch function, managing the loading/error/data state lifecycle, and normalising errors into a consistent shape.</p>
<pre><code class="language-typescript">// src/composables/useApi.ts
import { ref, readonly, type Ref } from 'vue'

export interface ApiError {
  status: number
  title: string
  detail: string
  traceId: string | null
}

export interface UseApiReturn&lt;T&gt; {
  data: Ref&lt;T | null&gt;
  error: Ref&lt;ApiError | null&gt;
  isLoading: Ref&lt;boolean&gt;
  execute: () =&gt; Promise&lt;void&gt;
}

export function useApi&lt;T&gt;(
  fetchFn: () =&gt; Promise&lt;T&gt;,
  options: { immediate?: boolean } = {}
): UseApiReturn&lt;T&gt; {
  const data = ref&lt;T | null&gt;(null) as Ref&lt;T | null&gt;
  const error = ref&lt;ApiError | null&gt;(null)
  const isLoading = ref(false)

  const execute = async () =&gt; {
    isLoading.value = true
    error.value = null

    try {
      data.value = await fetchFn()
    } catch (e) {
      error.value = normaliseError(e)
      data.value = null
    } finally {
      isLoading.value = false
    }
  }

  if (options.immediate) {
    execute()
  }

  return {
    data: readonly(data) as Ref&lt;T | null&gt;,
    error: readonly(error) as Ref&lt;ApiError | null&gt;,
    isLoading: readonly(isLoading) as Ref&lt;boolean&gt;,
    execute
  }
}

function normaliseError(e: unknown): ApiError {
  // BFF returns Problem Details (RFC 7807) — parse the structured body
  if (e instanceof ApiResponseError) {
    return {
      status: e.status,
      title: e.title,
      detail: e.detail,
      traceId: e.traceId
    }
  }

  // Network failure — no response body
  if (e instanceof TypeError &amp;&amp; e.message.includes('fetch')) {
    return {
      status: 0,
      title: 'Network error',
      detail: 'Could not reach the server. Check your connection.',
      traceId: null
    }
  }

  // Unexpected — still normalise
  return {
    status: -1,
    title: 'Unexpected error',
    detail: e instanceof Error ? e.message : 'An unknown error occurred.',
    traceId: null
  }
}
</code></pre>
<p>The <code>readonly</code> wrappers on <code>data</code> and <code>error</code> are deliberate. Components receive the state refs but cannot mutate them directly — mutations go through <code>execute</code>. This prevents a class of bugs where a component sets <code>data.value = null</code> as a local workaround and breaks another component consuming the same state.</p>
<hr />
<h2>The HTTP client and error class</h2>
<p>The base composable uses an <code>ApiResponseError</code> that wraps the BFF's Problem Details response. This is the class that bridges the HTTP layer with the composable's error normalisation:</p>
<pre><code class="language-typescript">// src/api/client.ts
export class ApiResponseError extends Error {
  constructor(
    public readonly status: number,
    public readonly title: string,
    public readonly detail: string,
    public readonly traceId: string | null
  ) {
    super(`\({status}: \){title}`)
  }
}

async function handleResponse&lt;T&gt;(response: Response): Promise&lt;T&gt; {
  if (response.ok) {
    return response.json() as Promise&lt;T&gt;
  }

  // Attempt to parse Problem Details body
  let problem = { title: 'Error', detail: 'An error occurred.', traceId: null }
  try {
    const body = await response.json()
    problem = {
      title: body.title ?? problem.title,
      detail: body.detail ?? problem.detail,
      traceId: body.traceId ?? null
    }
  } catch {
    // Non-JSON error body — use defaults
  }

  throw new ApiResponseError(
    response.status,
    problem.title,
    problem.detail,
    problem.traceId
  )
}

export const apiClient = {
  async get&lt;T&gt;(path: string, init?: RequestInit): Promise&lt;T&gt; {
    const response = await fetch(`/api${path}`, {
      ...init,
      headers: {
        'Accept': 'application/json',
        ...init?.headers
      },
      credentials: 'include' // Send session cookie
    })
    return handleResponse&lt;T&gt;(response)
  },

  async post&lt;T&gt;(path: string, body: unknown, init?: RequestInit): Promise&lt;T&gt; {
    const response = await fetch(`/api${path}`, {
      ...init,
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
        ...init?.headers
      },
      body: JSON.stringify(body),
      credentials: 'include'
    })
    return handleResponse&lt;T&gt;(response)
  }
}
</code></pre>
<p><code>credentials: 'include'</code> ensures the session cookie is sent with every request — the BFF's authentication is cookie-based, as established in the architecture. This is a single configuration point; forgetting it per-request was a failure mode caught in development.</p>
<p>The path prefix <code>/api</code> is injected once here. Components and composables never construct full URLs — they pass paths (<code>/dashboard</code>, <code>/courses/c-1</code>) and the client handles the rest.</p>
<hr />
<h2>Screen-level composables</h2>
<p>The base composable handles mechanics. Screen-level composables handle intent — they know which BFF endpoint to call, what type the response is, and what derived state the component tree needs.</p>
<pre><code class="language-typescript">// src/composables/useDashboard.ts
import { computed } from 'vue'
import { useApi } from './useApi'
import { apiClient } from '@/api/client'
import type { DashboardResponse } from '@/api/types'

export function useDashboard() {
  const { data, error, isLoading, execute } = useApi&lt;DashboardResponse&gt;(
    () =&gt; apiClient.get&lt;DashboardResponse&gt;('/dashboard'),
    { immediate: true }
  )

  // Derived state — components consume these, not raw data
  const user = computed(() =&gt; data.value?.user ?? null)
  const courses = computed(() =&gt; data.value?.courses ?? [])
  const upcomingSessions = computed(() =&gt; data.value?.upcomingSessions ?? [])
  const notificationCount = computed(() =&gt; data.value?.notifications.count ?? 0)

  // Surface partial failure state for component-level degradation handling
  const hasPartialFailure = computed(
    () =&gt; (data.value?.partialFailures?.length ?? 0) &gt; 0
  )
  const partialFailures = computed(() =&gt; data.value?.partialFailures ?? [])

  return {
    // State
    user,
    courses,
    upcomingSessions,
    notificationCount,
    hasPartialFailure,
    partialFailures,
    // Loading / error
    isLoading,
    error,
    // Manual refresh
    refresh: execute
  }
}
</code></pre>
<p>Three things to notice here.</p>
<p><strong>Derived</strong> <code>computed</code> <strong>refs, not raw</strong> <code>data</code><strong>.</strong> The component does not receive <code>data.value?.courses</code> — it receives <code>courses</code>, a computed ref with a <code>[]</code> fallback. This removes null-guard boilerplate from every component that consumes this composable. The fallback values are defined once, in the composable, not scattered across template expressions.</p>
<p><code>partialFailures</code> <strong>is surfaced explicitly.</strong> The BFF returns this field (built in Article 4) when some upstream services failed but the response was still structurally valid. The composable exposes <code>hasPartialFailure</code> and <code>partialFailures</code> so components can render degraded states with a specific message rather than a generic error.</p>
<p><code>immediate: true</code> <strong>fires the fetch on composable creation.</strong> The dashboard fetches its data as soon as the composable is created — which happens when the component mounts. For detail screens that require a parameter (a course ID, a session ID), <code>immediate</code> is <code>false</code> and <code>execute</code> is called explicitly once the parameter is available.</p>
<hr />
<h2>Using the composable in a component</h2>
<pre><code class="language-typescript">&lt;!-- src/views/DashboardView.vue --&gt;
&lt;script setup lang="ts"&gt;
import { useDashboard } from '@/composables/useDashboard'
import UserProfileCard from '@/components/UserProfileCard.vue'
import CourseListPanel from '@/components/CourseListPanel.vue'
import UpcomingSessionsList from '@/components/UpcomingSessionsList.vue'
import NotificationBadge from '@/components/NotificationBadge.vue'
import PartialFailureBanner from '@/components/PartialFailureBanner.vue'
import LoadingSpinner from '@/components/LoadingSpinner.vue'
import ErrorDisplay from '@/components/ErrorDisplay.vue'

const {
  user,
  courses,
  upcomingSessions,
  notificationCount,
  hasPartialFailure,
  partialFailures,
  isLoading,
  error,
  refresh
} = useDashboard()
&lt;/script&gt;

&lt;template&gt;
  &lt;div class="dashboard"&gt;
    &lt;LoadingSpinner v-if="isLoading" /&gt;

    &lt;ErrorDisplay
      v-else-if="error"
      :error="error"
      @retry="refresh"
    /&gt;

    &lt;template v-else&gt;
      &lt;PartialFailureBanner
        v-if="hasPartialFailure"
        :failures="partialFailures"
      /&gt;

      &lt;UserProfileCard v-if="user" :profile="user" /&gt;

      &lt;div class="dashboard-body"&gt;
        &lt;CourseListPanel :courses="courses" /&gt;
        &lt;UpcomingSessionsList :sessions="upcomingSessions" /&gt;
      &lt;/div&gt;

      &lt;NotificationBadge :count="notificationCount" /&gt;
    &lt;/template&gt;
  &lt;/div&gt;
&lt;/template&gt;
</code></pre>
<p>The component contains no <code>fetch</code> calls, no <code>try/catch</code>, no null guards beyond <code>v-if="user"</code>, and no type assertions. It destructures the composable and binds to named refs. If the BFF's <code>UserProfileResponse</code> type changes — say, <code>displayName</code> is renamed to <code>fullName</code> — TypeScript flags every component binding that uses the old name before the code compiles.</p>
<hr />
<h2>Parameterised composables: detail screens</h2>
<p>Not all composables fire immediately. Detail screens receive an ID from the route and fetch on mount with that ID.</p>
<pre><code class="language-typescript">// src/composables/useCourseDetail.ts
import { computed, watch, type Ref } from 'vue'
import { useApi } from './useApi'
import { apiClient } from '@/api/client'
import type { CourseDetailResponse } from '@/api/types'

export function useCourseDetail(courseId: Ref&lt;string&gt;) {
  const { data, error, isLoading, execute } = useApi&lt;CourseDetailResponse&gt;(
    () =&gt; apiClient.get&lt;CourseDetailResponse&gt;(`/courses/${courseId.value}`),
    { immediate: false } // Do not fire until courseId is known
  )

  // Fire whenever courseId changes — handles navigation between detail pages
  watch(courseId, () =&gt; execute(), { immediate: true })

  const course = computed(() =&gt; data.value ?? null)
  const sessions = computed(() =&gt; data.value?.sessions ?? [])
  const enrollmentOpen = computed(() =&gt; data.value?.enrollment.isOpen ?? false)

  return { course, sessions, enrollmentOpen, isLoading, error, refresh: execute }
}
</code></pre>
<p>The <code>watch</code> with <code>{ immediate: true }</code> fires on mount with the initial <code>courseId</code> value, and re-fires whenever the ID changes — which happens when the user navigates from one course detail to another without unmounting the view. Forgetting this watch was a bug caught in dev: navigating from course A to course B showed course A's data until the component was destroyed and recreated. The watch fixes it structurally.</p>
<p>Using the composable in the view:</p>
<pre><code class="language-typescript">&lt;!-- src/views/CourseDetailView.vue --&gt;
&lt;script setup lang="ts"&gt;
import { toRef } from 'vue'
import { useRoute } from 'vue-router'
import { useCourseDetail } from '@/composables/useCourseDetail'

const route = useRoute()
const courseId = toRef(() =&gt; route.params.courseId as string)

const { course, sessions, enrollmentOpen, isLoading, error, refresh } =
  useCourseDetail(courseId)
&lt;/script&gt;
</code></pre>
<p><code>toRef(() =&gt; ...)</code> creates a reactive ref from the route param — it updates when the route changes, which triggers the <code>watch</code> in the composable.</p>
<hr />
<h2>Error boundaries: the component-level strategy</h2>
<p>The base composable normalises errors into <code>ApiError</code>. But where errors are rendered is a component-level concern, and three distinct strategies apply depending on the screen.</p>
<h3>Strategy 1: Inline error with retry</h3>
<p>For primary data on a screen where failure should be surfaced and recoverable:</p>
<pre><code class="language-typescript">&lt;!-- src/components/ErrorDisplay.vue --&gt;
&lt;script setup lang="ts"&gt;
import type { ApiError } from '@/composables/useApi'

defineProps&lt;{
  error: ApiError
}&gt;()

const emit = defineEmits&lt;{
  retry: []
}&gt;()
&lt;/script&gt;

&lt;template&gt;
  &lt;div class="error-display" role="alert"&gt;
    &lt;p class="error-title"&gt;{{ error.title }}&lt;/p&gt;
    &lt;p class="error-detail"&gt;{{ error.detail }}&lt;/p&gt;
    &lt;p v-if="error.traceId" class="error-trace"&gt;
      Reference: {{ error.traceId }}
    &lt;/p&gt;
    &lt;button @click="emit('retry')"&gt;Try again&lt;/button&gt;
  &lt;/div&gt;
&lt;/template&gt;
</code></pre>
<p>The <code>traceId</code> renders in the UI. Support engineers ask users for it when investigating incidents. This is a small detail with outsized operational value — when a user reports a 503 and can read out a trace ID, an engineer can find the exact request in Application Insights in seconds.</p>
<h3>Strategy 2: Partial failure banner</h3>
<p>For degraded responses where some data loaded and some did not — surfaced via <code>partialFailures</code> from the BFF:</p>
<pre><code class="language-typescript">&lt;!-- src/components/PartialFailureBanner.vue --&gt;
&lt;script setup lang="ts"&gt;
const props = defineProps&lt;{
  failures: string[]
}&gt;()

const failureLabels: Record&lt;string, string&gt; = {
  courses: 'course list',
  sessions: 'upcoming sessions',
  notifications: 'notifications'
}

const failureDescription = computed(() =&gt;
  props.failures
    .map(f =&gt; failureLabels[f] ?? f)
    .join(' and ')
)
&lt;/script&gt;

&lt;template&gt;
  &lt;div class="partial-failure-banner" role="status"&gt;
    Some content could not be loaded ({{ failureDescription }}).
    The page may be incomplete.
  &lt;/div&gt;
&lt;/template&gt;
</code></pre>
<p>This component renders alongside the data that did load, rather than replacing it. The user sees a partially populated dashboard with an honest explanation — not a full-screen error for a non-critical failure.</p>
<h3>Strategy 3: Silent fallback for non-critical widgets</h3>
<p>For widgets where failure is genuinely inconsequential — a notification count badge, a "last login" display:</p>
<pre><code class="language-typescript">&lt;!-- src/components/NotificationBadge.vue --&gt;
&lt;script setup lang="ts"&gt;
defineProps&lt;{ count: number }&gt;()
// count arrives as 0 from the composable fallback if the upstream failed.
// No error state needed — 0 is a valid, renderable value.
&lt;/script&gt;

&lt;template&gt;
  &lt;span v-if="count &gt; 0" class="badge"&gt;{{ count }}&lt;/span&gt;
&lt;/template&gt;
</code></pre>
<p>The composable's <code>notificationCount</code> defaults to <code>0</code> when the notification service failed. The badge component renders nothing if the count is 0 — which is indistinguishable from "no notifications" to the user. This is the correct trade-off for a non-critical UI element.</p>
<p>The principle: choose the error strategy based on what the user can do in response to the failure, not based on technical severity. A 503 on the profile service warrants a full error display with retry. A 503 on the notification service warrants silence.</p>
<hr />
<h2>Write operations: POST, PATCH, and mutation composables</h2>
<p>Read composables are straightforward — fire on mount, expose derived state. Write composables are different: they are triggered by user interaction, need to track submission state separately from loading state, and must handle validation errors from the BFF distinctly from network errors.</p>
<pre><code class="language-typescript">// src/composables/useCourseEnrollment.ts
import { ref } from 'vue'
import { apiClient, ApiResponseError } from '@/api/client'
import type { EnrollmentRequest, EnrollmentResponse } from '@/api/types'

export function useCourseEnrollment() {
  const isSubmitting = ref(false)
  const validationErrors = ref&lt;Record&lt;string, string&gt;&gt;({})
  const submitError = ref&lt;string | null&gt;(null)
  const isSuccess = ref(false)

  const enroll = async (courseId: string, payload: EnrollmentRequest) =&gt; {
    isSubmitting.value = true
    validationErrors.value = {}
    submitError.value = null
    isSuccess.value = false

    try {
      await apiClient.post&lt;EnrollmentResponse&gt;(
        `/courses/${courseId}/enrollment`,
        payload
      )
      isSuccess.value = true
    } catch (e) {
      if (e instanceof ApiResponseError) {
        if (e.status === 422) {
          // BFF returns validation errors as an extensions object
          validationErrors.value = (e as any).extensions?.errors ?? {}
        } else {
          submitError.value = e.detail
        }
      } else {
        submitError.value = 'Could not complete enrollment. Please try again.'
      }
    } finally {
      isSubmitting.value = false
    }
  }

  return {
    enroll,
    isSubmitting,
    validationErrors,
    submitError,
    isSuccess
  }
}
</code></pre>
<p>The distinction between <code>validationErrors</code> (field-level, from a 422) and <code>submitError</code> (message-level, from any other error) mirrors how the BFF handles these two failure modes. The BFF returns structured validation errors on 422; the component receives them as a record and maps them to field labels. Any other failure becomes a banner message, not a field annotation.</p>
<hr />
<h2>Shared state: when a composable is not enough</h2>
<p>Most composables are created per-component and their state is local to that component's lifetime. Occasionally, state needs to be shared across components that are not in a parent-child relationship — the authenticated user's profile is the clearest example.</p>
<p>The solution in the production system was a Pinia store for session state only, with everything else in per-screen composables:</p>
<pre><code class="language-typescript">// src/stores/session.ts
import { defineStore } from 'pinia'
import { ref, computed } from 'vue'
import { apiClient } from '@/api/client'
import type { UserProfileResponse } from '@/api/types'

export const useSessionStore = defineStore('session', () =&gt; {
  const profile = ref&lt;UserProfileResponse | null&gt;(null)
  const isAuthenticated = computed(() =&gt; profile.value !== null)

  async function fetchProfile() {
    try {
      profile.value = await apiClient.get&lt;UserProfileResponse&gt;('/auth/me')
    } catch {
      profile.value = null
    }
  }

  function clearSession() {
    profile.value = null
  }

  return { profile, isAuthenticated, fetchProfile, clearSession }
})
</code></pre>
<p>The session store is initialised once in <code>App.vue</code> on mount. Every component that needs the user's display name or role reads from the store rather than making another BFF call. Everything else — courses, sessions, notifications — stays in per-screen composables. This is the minimum viable use of global state: use it only when the data genuinely needs to outlive any single component's lifetime.</p>
<hr />
<h2>The complete data flow, end to end</h2>
<p>To make the layers concrete:</p>
<pre><code class="language-plaintext">BFF OpenAPI spec
  ↓  openapi-typescript
src/api/types.gen.ts
  ↓  re-exported via
src/api/types.ts
  ↓  imported by
src/composables/useDashboard.ts
  ↓  typed return value consumed by
src/views/DashboardView.vue
  ↓  typed props passed to
src/components/CourseListPanel.vue
</code></pre>
<p>Every link in this chain is type-checked. A change at the BFF end propagates as a TypeScript error through every layer until every consumer is updated. No runtime surprises. No "it worked in development" moments in staging.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/6a0d999a-a5b0-4ddb-af46-0ab25abfe7ab.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What the production system learned</h2>
<p>A few failure modes that did not survive contact with production and the decisions that replaced them:</p>
<p><code>axios</code> <strong>interceptors for error handling.</strong> The first implementation used an Axios response interceptor to normalise errors globally. This worked until two endpoints needed different error handling behaviour — one needed to suppress 404s silently, another needed to escalate 403s to a full-page redirect. Interceptor configuration became conditional logic that was harder to reason about than per-composable handling. The <code>apiClient</code> wrapper with <code>handleResponse</code> replaced it: explicit, co-located, composable-specific where needed.</p>
<p><strong>Loading state in Vuex before Pinia.</strong> The early implementation tracked <code>isLoading</code> for every API call in a Vuex module keyed by endpoint name. Components dispatched actions and read loading state from the store. This scaled poorly — the store grew large, loading state leaked between navigations, and testing required mocking the entire store for any component test. Per-composable <code>isLoading</code> refs fixed all three problems.</p>
<p><strong>Direct</strong> <code>fetch</code> <strong>calls in components.</strong> Several components in the early sprint made <code>fetch</code> calls directly in <code>onMounted</code>. This was fast to write and immediately problematic: no shared loading state, no consistent error handling, no type safety, and no way to test the fetch logic independently of the component. The <code>useApi</code> base composable was introduced at sprint 4 and all direct fetch calls were migrated over two weeks.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/9686e035-089f-4c65-9046-0728aff265a2.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What comes next</h2>
<p>The API layer is complete on both sides — the BFF shapes and serves, the Vue composables consume and distribute. The next article addresses the most architecturally consequential decision in the implementation: authentication. Specifically, how Feide — Norway's government-issued identity provider for the education sector — is integrated via the BFF using the Token Handler pattern, why this requires server-side session management, and why tokens should never reach the browser in this architecture.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li>→ <a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Building the BFF in .NET Core: Minimal APIs, Routing & Aggregation]]></title><description><![CDATA[A note on the code in this article. The implementation shown here is derived from a production BFF built for a Norwegian enterprise education platform. Service names, domain models, and certain struct]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/building-the-bff-in-net-core-minimal-apis-routing-aggregation</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/building-the-bff-in-net-core-minimal-apis-routing-aggregation</guid><category><![CDATA[.net core]]></category><category><![CDATA[C#]]></category><category><![CDATA[minimal-apis]]></category><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[API Aggregation]]></category><category><![CDATA[routing]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Resilience]]></category><category><![CDATA[error handling]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 12 Apr 2026 02:42:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/f8687dec-e8cb-4994-b932-be44689a34e7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<blockquote>
<p><strong>A note on the code in this article.</strong> The implementation shown here is derived from a production BFF built for a Norwegian enterprise education platform. Service names, domain models, and certain structural details have been generalised to meet NDA obligations. The architectural patterns, code structure, error handling strategies, and the specific problems each decision solves are drawn directly from what was deployed and operated in production. Nothing here is invented for illustration.</p>
</blockquote>
<hr />
<p>The previous two articles covered what a BFF should do and why it earns its place in your architecture. This article covers how to build one — in .NET Core, with Minimal APIs, from an empty project to a production-ready service that aggregates upstream calls, shapes responses for Vue, and handles failures gracefully.</p>
<p>There is a gap between "here is the pattern" and "here is the thing running in production," and most articles stop before bridging it. This one does not. The code examples are complete enough to be followed, the decisions behind them are explained, and the production failure modes they address are named.</p>
<hr />
<h2>Project setup and structure</h2>
<p>Start with a minimal .NET 8 Web API project. The BFF does not need controllers — Minimal APIs are the right tool here. They are explicit, lightweight, and push route organisation into the project structure rather than into a controller inheritance hierarchy.</p>
<pre><code class="language-shell">dotnet new web -n EducationPlatform.Bff
cd EducationPlatform.Bff
dotnet add package Microsoft.AspNetCore.Authentication.JwtBearer
dotnet add package Microsoft.Extensions.Http.Resilience
dotnet add package Microsoft.AspNetCore.Authentication.Cookies
dotnet add package Serilog.AspNetCore
dotnet add package Serilog.Sinks.ApplicationInsights
</code></pre>
<p>The project structure that emerged from the production implementation:</p>
<pre><code class="language-plaintext">EducationPlatform.Bff/
├── Endpoints/
│   ├── DashboardEndpoints.cs
│   ├── CourseEndpoints.cs
│   └── SessionEndpoints.cs
├── Aggregators/
│   ├── DashboardAggregator.cs
│   └── CourseAggregator.cs
├── Clients/
│   ├── UserServiceClient.cs
│   ├── CourseServiceClient.cs
│   ├── SessionServiceClient.cs
│   └── NotificationServiceClient.cs
├── Contracts/
│   ├── Requests/
│   └── Responses/
├── Errors/
│   └── BffProblemDetails.cs
├── Middleware/
│   └── CorrelationIdMiddleware.cs
└── Program.cs
</code></pre>
<p>The separation between <code>Clients</code> and <code>Aggregators</code> is the key structural decision. Clients know how to talk to one upstream service. Aggregators know how to compose multiple client calls into a single, shaped response. Endpoints know which aggregator to call and how to map the result to an HTTP response. No layer bleeds into another's concern.</p>
<hr />
<h2>Program.cs: wiring the service</h2>
<p>The entry point registers services, configures HTTP clients, and maps endpoints. Keep it declarative — the configuration intent should be readable without tracing through implementation details.</p>
<pre><code class="language-csharp">using EducationPlatform.Bff.Clients;
using EducationPlatform.Bff.Endpoints;
using EducationPlatform.Bff.Middleware;
using Serilog;

var builder = WebApplication.CreateBuilder(args);

// Logging — Serilog with Application Insights sink
builder.Host.UseSerilog((ctx, cfg) =&gt; cfg
    .ReadFrom.Configuration(ctx.Configuration)
    .Enrich.FromLogContext()
    .Enrich.WithProperty("Service", "bff")
    .WriteTo.Console()
    .WriteTo.ApplicationInsights(
        ctx.Configuration["ApplicationInsights:ConnectionString"],
        TelemetryConverter.Traces));

// HTTP clients with resilience
builder.Services.AddHttpClient&lt;UserServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:UserService:BaseUrl"]!))
    .AddStandardResilienceHandler();

builder.Services.AddHttpClient&lt;CourseServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:CourseService:BaseUrl"]!))
    .AddStandardResilienceHandler();

builder.Services.AddHttpClient&lt;SessionServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:SessionService:BaseUrl"]!))
    .AddStandardResilienceHandler();

builder.Services.AddHttpClient&lt;NotificationServiceClient&gt;(client =&gt;
    client.BaseAddress = new Uri(builder.Configuration["Services:NotificationService:BaseUrl"]!))
    .AddStandardResilienceHandler();

// Aggregators
builder.Services.AddScoped&lt;DashboardAggregator&gt;();
builder.Services.AddScoped&lt;CourseAggregator&gt;();

// Auth — covered in depth in Article 6
builder.Services.AddAuthentication("cookie")
    .AddCookie("cookie");
builder.Services.AddAuthorization();

// Problem details for structured error responses
builder.Services.AddProblemDetails();

var app = builder.Build();

// Middleware pipeline
app.UseMiddleware&lt;CorrelationIdMiddleware&gt;();
app.UseSerilogRequestLogging();
app.UseAuthentication();
app.UseAuthorization();

// Endpoint registration
app.MapDashboardEndpoints();
app.MapCourseEndpoints();
app.MapSessionEndpoints();

app.Run();
</code></pre>
<p><code>AddStandardResilienceHandler()</code> comes from <code>Microsoft.Extensions.Http.Resilience</code>. It wires retry, circuit breaker, timeout, and hedging policies with sensible defaults — without requiring manual Polly configuration for the standard case. Article Extra E covers customising these policies for partial failure scenarios that the standard handler does not cover.</p>
<hr />
<h2>The typed HTTP clients</h2>
<p>Each upstream service gets a dedicated typed client. The client is responsible for one thing: making HTTP calls to that service and deserialising the response. It does not shape, transform, or make decisions about what the frontend needs.</p>
<pre><code class="language-csharp">// Clients/CourseServiceClient.cs
public sealed class CourseServiceClient(HttpClient http, ILogger&lt;CourseServiceClient&gt; logger)
{
    public async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; GetCoursesByOrgAsync(
        string orgId,
        CancellationToken ct = default)
    {
        try
        {
            var response = await http.GetAsync($"courses?orgId={orgId}", ct);
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadFromJsonAsync&lt;IReadOnlyList&lt;CourseDto&gt;&gt;(ct);
        }
        catch (HttpRequestException ex)
        {
            logger.LogWarning(ex,
                "Course service unavailable for orgId {OrgId}. StatusCode: {Status}",
                orgId, ex.StatusCode);
            return null; // Caller decides how to handle absence
        }
    }

    public async Task&lt;CourseDetailDto?&gt; GetCourseDetailAsync(
        string courseId,
        CancellationToken ct = default)
    {
        try
        {
            var response = await http.GetAsync($"courses/{courseId}", ct);
            if (response.StatusCode == HttpStatusCode.NotFound) return null;
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadFromJsonAsync&lt;CourseDetailDto&gt;(ct);
        }
        catch (HttpRequestException ex)
        {
            logger.LogWarning(ex,
                "Failed to retrieve course {CourseId}. StatusCode: {Status}",
                courseId, ex.StatusCode);
            return null;
        }
    }
}
</code></pre>
<p>Returning <code>null</code> on upstream failure is a deliberate choice. The client signals absence; the aggregator decides whether absence means degraded response or hard failure. This keeps failure policy out of the client and in the layer that understands what the client surface needs.</p>
<p>The DTOs the client deserialises into mirror the upstream service's response shape exactly. They are internal types — they never leave the BFF:</p>
<pre><code class="language-csharp">// Contracts internal upstream DTOs - never exposed to Vue
internal sealed record CourseDto(
    string Id,
    CourseMetadataDto Metadata,
    EnrollmentDto Enrollment,
    CourseStatusDto Status
);

internal sealed record CourseMetadataDto(
    string Title,
    string Code,
    CurriculumDto Curriculum
);

internal sealed record EnrollmentDto(int Capacity, int Enrolled, int Waitlist);
internal sealed record CourseStatusDto(string Code, DateTimeOffset Since);
</code></pre>
<hr />
<h2>The aggregator: orchestrating upstream calls</h2>
<p>The aggregator is the most important class in the BFF. It owns the dependency graph of upstream calls, executes them efficiently, shapes the merged result into a response the Vue application can consume directly, and decides how to handle partial failures.</p>
<pre><code class="language-csharp">// Aggregators/DashboardAggregator.cs
public sealed class DashboardAggregator(
    UserServiceClient userClient,
    CourseServiceClient courseClient,
    SessionServiceClient sessionClient,
    NotificationServiceClient notificationClient,
    ILogger&lt;DashboardAggregator&gt; logger)
{
    public async Task&lt;DashboardResponse&gt; AggregateAsync(
        string userId,
        CancellationToken ct = default)
    {
        var partialFailures = new List&lt;string&gt;();

        // Phase 1: independent calls in parallel
        var profileTask = userClient.GetProfileAsync(userId, ct);
        var notificationTask = notificationClient.GetUnreadCountAsync(userId, ct);

        await Task.WhenAll(profileTask, notificationTask);

        var profile = profileTask.Result;

        // Profile is required — cannot render a meaningful dashboard without it
        if (profile is null)
        {
            logger.LogError("User profile unavailable for userId {UserId}. Aborting aggregation.", userId);
            throw new BffAggregationException("User profile service unavailable.");
        }

        // Phase 2: depends on orgId from profile
        var courses = await courseClient.GetCoursesByOrgAsync(profile.OrgId, ct);
        if (courses is null)
        {
            logger.LogWarning("Course service unavailable for org {OrgId}. Returning degraded response.", profile.OrgId);
            partialFailures.Add("courses");
        }

        // Phase 3: depends on courseIds from Phase 2
        IReadOnlyList&lt;SessionDto&gt;? sessions = null;
        if (courses is not null &amp;&amp; courses.Count &gt; 0)
        {
            var courseIds = courses.Select(c =&gt; c.Id).ToArray();
            sessions = await sessionClient.GetUpcomingAsync(courseIds, limit: 3, ct);
            if (sessions is null)
            {
                logger.LogWarning("Session service unavailable. Returning degraded response.");
                partialFailures.Add("sessions");
            }
        }

        return new DashboardResponse(
            User: ShapeUserProfile(profile),
            Courses: courses?.Select(ShapeCourse).ToList() ?? [],
            UpcomingSessions: sessions?.Select(ShapeSession).ToList() ?? [],
            Notifications: new NotificationSummary(notificationTask.Result ?? 0),
            PartialFailures: partialFailures
        );
    }

    // Shape methods: upstream DTO → Vue response contract
    private static UserProfileResponse ShapeUserProfile(UserProfileDto dto) =&gt;
        new(
            DisplayName: $"{dto.FirstName} {dto.LastName}",
            Role: TranslateRole(dto.RoleCode),
            AvatarUrl: dto.AvatarPath is not null
                ? $"/media/avatars/{dto.AvatarPath}"
                : null
        );

    private static CourseResponse ShapeCourse(CourseDto dto) =&gt;
        new(
            Id: dto.Id,
            Title: dto.Metadata.Title,
            Code: dto.Metadata.Code,
            EnrollmentLabel: $"{dto.Enrollment.Enrolled} / {dto.Enrollment.Capacity}",
            EnrollmentPercent: dto.Enrollment.Capacity &gt; 0
                ? (int)Math.Round((dto.Enrollment.Enrolled / (double)dto.Enrollment.Capacity) * 100)
                : 0,
            Status: TranslateStatus(dto.Status.Code)
        );

    private static SessionResponse ShapeSession(SessionDto dto) =&gt;
        new(
            Id: dto.Id,
            Title: dto.Title,
            StartsAt: dto.StartsAt.ToString("yyyy-MM-ddTHH:mm:ss"),
            CourseTitle: dto.CourseTitle,
            LocationLabel: dto.Room is not null ? $"Room {dto.Room}" : "Online"
        );

    private static string TranslateRole(string roleCode) =&gt; roleCode switch
    {
        "TEACHER" =&gt; "Teacher",
        "STUDENT" =&gt; "Student",
        "ADMIN"   =&gt; "Administrator",
        _         =&gt; "Unknown"
    };

    private static string TranslateStatus(string statusCode) =&gt; statusCode switch
    {
        "ACTIVE"   =&gt; "Active",
        "INACTIVE" =&gt; "Inactive",
        "DRAFT"    =&gt; "Draft",
        _          =&gt; statusCode
    };
}
</code></pre>
<p>Three decisions worth making explicit here:</p>
<p><strong>Profile failure is a hard error; course and session failure is degraded.</strong> The dashboard cannot render without a user profile — there is no meaningful fallback. Course and session data, by contrast, can fail independently. The Vue component handles <code>courses: []</code> and <code>partialFailures: ["courses"]</code> by showing an empty state with an appropriate message. This distinction — required vs. supplementary data — must be made per-aggregator based on what each screen actually needs.</p>
<p><strong>Shape methods are private and co-located with the aggregator.</strong> The shaping logic belongs next to the aggregation logic it serves. A separate <code>CourseShaper</code> class would be indirection without benefit at this scale. If shaping logic grows complex enough to warrant extraction, that is a signal to reconsider the response contract design.</p>
<p><strong>The</strong> <code>partialFailures</code> <strong>list is part of the response contract.</strong> The Vue application receives this and uses it to decide what to render. This is not error handling hidden from the client — it is explicit communication of what succeeded and what did not, at the response level.</p>
<hr />
<h2>The response contracts</h2>
<p>The types exposed to the Vue application are defined separately from the internal upstream DTOs. This is the BFF's external contract — it should be stable and versioned independently of internal implementation.</p>
<pre><code class="language-csharp">// Contracts/Responses/DashboardResponse.cs
public sealed record DashboardResponse(
    UserProfileResponse User,
    IReadOnlyList&lt;CourseResponse&gt; Courses,
    IReadOnlyList&lt;SessionResponse&gt; UpcomingSessions,
    NotificationSummary Notifications,
    IReadOnlyList&lt;string&gt; PartialFailures
);

public sealed record UserProfileResponse(
    string DisplayName,
    string Role,
    string? AvatarUrl
);

public sealed record CourseResponse(
    string Id,
    string Title,
    string Code,
    string EnrollmentLabel,
    int EnrollmentPercent,
    string Status
);

public sealed record SessionResponse(
    string Id,
    string Title,
    string StartsAt,
    string CourseTitle,
    string LocationLabel
);

public sealed record NotificationSummary(int Count);
</code></pre>
<p>Using <code>record</code> types for response contracts gives structural equality, immutability, and clean JSON serialisation out of the box. The records are not annotated with <code>[JsonPropertyName]</code> unless the Vue convention demands camelCase divergence from the default serialiser behaviour — which .NET's <code>JsonSerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.CamelCase</code> handles globally.</p>
<hr />
<h2>The endpoint layer</h2>
<p>Endpoints are thin. They authenticate the request, extract route or query parameters, call the aggregator, and return the result. Nothing more.</p>
<pre><code class="language-csharp">// Endpoints/DashboardEndpoints.cs
public static class DashboardEndpoints
{
    public static IEndpointRouteBuilder MapDashboardEndpoints(
        this IEndpointRouteBuilder app)
    {
        var group = app.MapGroup("/api/dashboard")
            .RequireAuthorization();

        group.MapGet("/", GetDashboardAsync)
            .WithName("GetDashboard")
            .WithOpenApi();

        return app;
    }

    private static async Task&lt;IResult&gt; GetDashboardAsync(
        HttpContext ctx,
        DashboardAggregator aggregator,
        CancellationToken ct)
    {
        var userId = ctx.User.FindFirstValue(ClaimTypes.NameIdentifier);

        if (userId is null)
            return Results.Problem(
                detail: "Authenticated user identity could not be resolved.",
                statusCode: StatusCodes.Status401Unauthorized);

        try
        {
            var response = await aggregator.AggregateAsync(userId, ct);
            return Results.Ok(response);
        }
        catch (BffAggregationException ex)
        {
            return Results.Problem(
                detail: ex.Message,
                statusCode: StatusCodes.Status503ServiceUnavailable,
                title: "Upstream service unavailable");
        }
    }
}
</code></pre>
<p>The endpoint does not know what an aggregator does internally. It knows how to interpret the result and how to translate a <code>BffAggregationException</code> into a 503 Problem Details response. This is the correct division.</p>
<h3>Route grouping and versioning</h3>
<p>Route groups keep related endpoints together and make versioning explicit when it becomes necessary:</p>
<pre><code class="language-csharp">// For stable endpoints — no version in path
var group = app.MapGroup("/api/dashboard").RequireAuthorization();

// When a breaking contract change is required
var v2Group = app.MapGroup("/api/v2/dashboard").RequireAuthorization();
</code></pre>
<p>In the production system, versioned groups were introduced only twice in the service's lifetime — both during major screen redesigns. Day-to-day contract evolution was additive only, keeping the URL stable. This is the versioning discipline described in Article 2 in practice.</p>
<hr />
<h2>Error handling: the BFF error contract</h2>
<p>The Vue application needs consistent, structured error information. .NET's Problem Details (RFC 7807) provides this structure out of the box, and the BFF should use it exclusively.</p>
<pre><code class="language-csharp">// Errors/BffProblemDetails.cs
public sealed class BffAggregationException(string message) : Exception(message);
public sealed class BffNotFoundException(string resource) 
    : Exception($"Resource not found: {resource}");

// Global exception handler registered in Program.cs
app.UseExceptionHandler(exceptionApp =&gt;
{
    exceptionApp.Run(async ctx =&gt;
    {
        ctx.Response.ContentType = "application/problem+json";

        var ex = ctx.Features.Get&lt;IExceptionHandlerFeature&gt;()?.Error;
        var logger = ctx.RequestServices.GetRequiredService&lt;ILogger&lt;Program&gt;&gt;();

        var (status, title, detail) = ex switch
        {
            BffAggregationException e =&gt;
                (503, "Upstream service unavailable", e.Message),
            BffNotFoundException e =&gt;
                (404, "Resource not found", e.Message),
            OperationCanceledException =&gt;
                (499, "Request cancelled", "The request was cancelled by the client."),
            _ =&gt;
                (500, "Unexpected error", "An unexpected error occurred. Correlation ID: " +
                    ctx.TraceIdentifier)
        };

        logger.LogError(ex, "Unhandled exception. CorrelationId: {CorrelationId}", ctx.TraceIdentifier);

        ctx.Response.StatusCode = status;
        await ctx.Response.WriteAsJsonAsync(new
        {
            type = $"https://bff.educationplatform.no/errors/{title.ToLower().Replace(' ', '-')}",
            title,
            status,
            detail,
            traceId = ctx.TraceIdentifier
        });
    });
});
</code></pre>
<p>The <code>traceId</code> in every error response is the correlation ID that threads through Application Insights. When a 503 appears in the Vue application and an engineer opens Application Insights, searching by <code>traceId</code> surfaces every log entry for that request — BFF logs, upstream client logs, the exception itself. This is not gold-plating; it is the minimum viable observability for a service that aggregates multiple upstreams.</p>
<hr />
<h2>Correlation ID middleware</h2>
<p>Every request entering the BFF should carry a correlation ID that propagates to every upstream call. This is the mechanism that makes request tracing work across service boundaries.</p>
<pre><code class="language-csharp">// Middleware/CorrelationIdMiddleware.cs
public sealed class CorrelationIdMiddleware(RequestDelegate next)
{
    private const string CorrelationIdHeader = "X-Correlation-Id";

    public async Task InvokeAsync(HttpContext ctx)
    {
        var correlationId = ctx.Request.Headers[CorrelationIdHeader].FirstOrDefault()
            ?? Activity.Current?.Id
            ?? ctx.TraceIdentifier;

        ctx.Response.Headers[CorrelationIdHeader] = correlationId;

        using (LogContext.PushProperty("CorrelationId", correlationId))
        {
            await next(ctx);
        }
    }
}
</code></pre>
<p>And in each typed client, the correlation ID is forwarded on every outgoing request:</p>
<pre><code class="language-csharp">// Clients/CourseServiceClient.cs — updated constructor
public sealed class CourseServiceClient(
    HttpClient http,
    IHttpContextAccessor contextAccessor,
    ILogger&lt;CourseServiceClient&gt; logger)
{
    private void AttachCorrelationId(HttpRequestMessage request)
    {
        var correlationId = contextAccessor.HttpContext?
            .Response.Headers["X-Correlation-Id"].FirstOrDefault();
        if (correlationId is not null)
            request.Headers.TryAddWithoutValidation("X-Correlation-Id", correlationId);
    }

    public async Task&lt;IReadOnlyList&lt;CourseDto&gt;?&gt; GetCoursesByOrgAsync(
        string orgId, CancellationToken ct = default)
    {
        var request = new HttpRequestMessage(HttpMethod.Get, $"courses?orgId={orgId}");
        AttachCorrelationId(request);
        try
        {
            var response = await http.SendAsync(request, ct);
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadFromJsonAsync&lt;IReadOnlyList&lt;CourseDto&gt;&gt;(ct);
        }
        catch (HttpRequestException ex)
        {
            logger.LogWarning(ex,
                "Course service unavailable for orgId {OrgId}", orgId);
            return null;
        }
    }
}
</code></pre>
<p>Register <code>IHttpContextAccessor</code> in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">builder.Services.AddHttpContextAccessor();
</code></pre>
<hr />
<h2>Configuration and appsettings</h2>
<p>The BFF reads service base URLs, authentication configuration, and Application Insights connection string from <code>appsettings.json</code>, with environment-specific overrides injected as environment variables in Azure Container Instances.</p>
<pre><code class="language-json">// appsettings.json
{
  "Services": {
    "UserService": { "BaseUrl": "https://user-service.internal/" },
    "CourseService": { "BaseUrl": "https://course-service.internal/" },
    "SessionService": { "BaseUrl": "https://session-service.internal/" },
    "NotificationService": { "BaseUrl": "https://notification-service.internal/" }
  },
  "ApplicationInsights": {
    "ConnectionString": "" // Injected at runtime via environment variable
  },
  "Authentication": {
    "Cookie": {
      "Name": "__bff_session",
      "HttpOnly": true,
      "Secure": true,
      "SameSite": "Strict"
    }
  }
}
</code></pre>
<p>The <code>appsettings.json</code> contains non-sensitive defaults. Secrets — connection strings, service credentials — are never checked into source control. In Azure Container Instances, they are injected as environment variables or pulled from Azure Key Vault at startup. Article 7 covers this configuration pattern in the deployment pipeline.</p>
<hr />
<h2>OpenAPI and type generation</h2>
<p>Enable OpenAPI in <code>Program.cs</code>:</p>
<pre><code class="language-csharp">builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen(opts =&gt;
{
    opts.SwaggerDoc("v1", new OpenApiInfo
    {
        Title = "Education Platform BFF",
        Version = "v1",
        Description = "Backend for Frontend — Vue web application"
    });
});

// In development only
if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
}
</code></pre>
<p>The generated OpenAPI spec at <code>/swagger/v1/swagger.json</code> is the source of truth for the Vue application's TypeScript types. In the monorepo, a <code>generate:api</code> script runs <code>openapi-typescript</code> against this spec during development and in CI:</p>
<pre><code class="language-json">// package.json (Vue project)
{
  "scripts": {
    "generate:api": "openapi-typescript http://localhost:5000/swagger/v1/swagger.json -o src/api/types.gen.ts"
  }
}
</code></pre>
<p>The generated <code>types.gen.ts</code> imports directly into Vue composables. If the BFF changes a response type and the Vue application's composable does not update to match, TypeScript catches it at compile time — not at runtime in production.</p>
<hr />
<h2>Health checks</h2>
<p>A BFF running in Azure Container Instances needs health endpoints for the container readiness and liveness probes:</p>
<pre><code class="language-csharp">builder.Services.AddHealthChecks()
    .AddUrlGroup(
        new Uri(builder.Configuration["Services:UserService:BaseUrl"] + "health"),
        "user-service")
    .AddUrlGroup(
        new Uri(builder.Configuration["Services:CourseService:BaseUrl"] + "health"),
        "course-service");

// In app pipeline
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ =&gt; false // Liveness: BFF process is alive, no dependency checks
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = _ =&gt; true, // Readiness: all upstream services reachable
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
</code></pre>
<p>The distinction between liveness and readiness matters in ACI. Liveness failure restarts the container. Readiness failure removes it from the load balancer without restarting. A BFF that is running but cannot reach an upstream service should be removed from traffic, not restarted — hence the separate endpoints. Article 7 wires these to the ACI container group configuration.</p>
<hr />
<h2>A complete request, traced end to end</h2>
<p>To make the moving parts concrete, here is the full lifecycle of a <code>GET /api/dashboard</code> request:</p>
<ol>
<li><p>Request arrives at Azure Front Door. TLS terminated. JWT validated by Azure API Management.</p>
</li>
<li><p>APIM forwards the request to the BFF container, adding <code>X-Correlation-Id</code> if not present.</p>
</li>
<li><p><code>CorrelationIdMiddleware</code> extracts the correlation ID, adds it to the log context, and sets it on the response header.</p>
</li>
<li><p>Authentication middleware validates the session cookie and populates <code>HttpContext.User</code>.</p>
</li>
<li><p>The <code>GetDashboard</code> endpoint handler extracts <code>userId</code> from claims.</p>
</li>
<li><p><code>DashboardAggregator.AggregateAsync</code> fires Phase 1 calls — <code>UserServiceClient</code> and <code>NotificationServiceClient</code> — in parallel, each forwarding <code>X-Correlation-Id</code>.</p>
</li>
<li><p>Profile returned. Phase 2: <code>CourseServiceClient</code> fetches by <code>orgId</code>.</p>
</li>
<li><p>Courses returned. Phase 3: <code>SessionServiceClient</code> fetches upcoming sessions by <code>courseIds</code>.</p>
</li>
<li><p>Aggregator shapes all results into <code>DashboardResponse</code>, recording any partial failures.</p>
</li>
<li><p>Endpoint returns <code>200 OK</code> with the shaped response body and <code>X-Correlation-Id</code> header.</p>
</li>
<li><p>Serilog writes a structured request log entry to Application Insights, including correlation ID, duration, status code, and upstream call counts.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/c40b3e83-eb20-4505-ad93-bb6dae202b06.png" alt="" style="display:block;margin:0 auto" />

<p>Every step is observable. Every failure is traceable. The Vue application receives a single, coherent response in a shape it can consume without transformation.</p>
<hr />
<h2>What comes next</h2>
<p>This article built the BFF service from structure to running implementation. The next article builds the other side of the contract — the Vue 3 API layer: composables that consume the BFF, typed against the generated OpenAPI spec, with error handling and loading states that map cleanly to what the BFF returns.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li>→ <a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[BFF vs API Gateway vs GraphQL: Picking the Right Abstraction]]></title><description><![CDATA[Before writing a single line of BFF code, one question deserves a direct answer: is BFF actually the right abstraction for your situation, or would an API Gateway or GraphQL solve the same problems wi]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction</guid><category><![CDATA[bff]]></category><category><![CDATA[API Gateway]]></category><category><![CDATA[GraphQL]]></category><category><![CDATA[API Design]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[FrontendArchitecture]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[System Architecture]]></category><category><![CDATA[developer experience]]></category><category><![CDATA[scalability]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sat, 11 Apr 2026 08:44:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/960e7a53-691b-4746-9ded-43bf50e1f754.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before writing a single line of BFF code, one question deserves a direct answer: is BFF actually the right abstraction for your situation, or would an API Gateway or GraphQL solve the same problems with less overhead?</p>
<p>This is not a rhetorical question. All three patterns address the same underlying tension — the mismatch between what backend services expose and what frontend clients need — but they address it differently, at different layers, with different cost profiles and different failure modes. Choosing the wrong abstraction early is expensive to undo. Choosing the right one requires understanding not just what each pattern does, but what each pattern is designed to resist.</p>
<p>This article gives each pattern a fair hearing, identifies where each genuinely wins, and then addresses the question every architect eventually faces: can they coexist, and if so, how?</p>
<hr />
<h2>The common problem, three different answers</h2>
<p>All three abstractions exist because of the same structural tension: backend services are organised around domain entities and service boundaries; frontend clients are organised around screens, interactions, and user tasks. These two organisational principles produce different, often incompatible data shapes.</p>
<p>The three patterns answer this tension differently:</p>
<ul>
<li><p><strong>API Gateway</strong> says: <em>standardise the entry point, enforce cross-cutting concerns, and route requests to the right service — but leave shape and aggregation to the client or the services themselves.</em></p>
</li>
<li><p><strong>GraphQL</strong> says: <em>let the client declare exactly what data it needs in a single query, and build a schema that spans service boundaries so the server can resolve it.</em></p>
</li>
<li><p><strong>BFF</strong> says: <em>create a dedicated server-side layer, owned by the frontend team, that aggregates and shapes data specifically for one client surface.</em></p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/77244829-a8e1-4aa5-bef5-092d6d458a8f.png" alt="" style="display:block;margin:0 auto" />

<p>Each answer implies a different locus of complexity. API Gateway concentrates complexity in infrastructure. GraphQL concentrates it in the schema and resolver layer. BFF concentrates it in the service itself. Where you want that complexity to live — and who you want to own it — is a significant part of the decision.</p>
<hr />
<h2>API Gateway</h2>
<h3>What it is designed to do</h3>
<p>An API Gateway sits at the network perimeter and handles concerns that apply uniformly across all traffic entering the system: authentication and authorisation enforcement, rate limiting, TLS termination, request routing, logging, and protocol translation. In Azure, this is Azure API Management. In AWS, it is Amazon API Gateway. In a self-managed cluster, it might be Kong or Nginx.</p>
<p>The Gateway's job is to be a consistent, reliable entry point — not to understand what any individual client needs from any particular service. It routes a request to the right upstream service and returns what that service returns, perhaps with light transformation, caching, or throttling applied.</p>
<h3>Where it wins</h3>
<p><strong>Cross-cutting concerns at scale.</strong> Rate limiting, IP allowlisting, API key management, OAuth 2.0 token validation, and request logging are applied once, at the Gateway, and are invisible to every upstream service. Without a Gateway, each service implements these independently, inconsistently, and at higher total cost. This is the Gateway's clearest and most defensible value.</p>
<p><strong>Protocol and version mediation.</strong> A Gateway can expose a REST interface in front of gRPC services, or route <code>v1</code> requests to legacy services while <code>v2</code> requests go to new implementations — without the client needing to know the upstream topology changed. This is particularly valuable during service migrations.</p>
<p><strong>Traffic management.</strong> Circuit breaking, retry policies, canary routing, and A/B traffic splitting are Gateway-layer concerns that do not belong in application code. A well-configured Gateway protects upstream services from traffic spikes and provides the operational levers to manage deployments safely.</p>
<h3>Where it falls over</h3>
<p><strong>It does not solve the aggregation problem.</strong> A Gateway routes requests; it does not compose them. A screen that requires data from four services still requires four round trips from the client, or a Gateway orchestration configuration so complex it is effectively a new service in disguise. Some Gateways support request aggregation through scripting (Kong's Lua plugins, APIM's policies), but this is instrumenting infrastructure to do application logic — a direction that leads to unmaintainable policy files.</p>
<p><strong>It cannot be client-specific.</strong> A Gateway enforces the same behaviour for all clients. A web application, a mobile app, and a third-party integration all receive the same treatment. Response shapes cannot differ per client without replicating route configurations and transformation rules, which does not scale.</p>
<p><strong>Frontend teams do not own it.</strong> The Gateway is typically owned by a platform or infrastructure team. A frontend team that needs a new field in a response shape, or a different error format, or a caching policy for a specific endpoint, must file a request and wait. This is not a process failure — it is the correct operational model for shared infrastructure. But it means the Gateway cannot provide the autonomy that makes BFF valuable.</p>
<h3>The right use of API Gateway in a BFF architecture</h3>
<p>API Gateway and BFF are not alternatives — they are layers. The Gateway handles what the Gateway is designed to handle: network-level concerns, security perimeter enforcement, traffic management. The BFF sits behind it and handles what the BFF is designed to handle: aggregation, shaping, and client-specific logic.</p>
<p>In the production system this series is based on, Azure API Management sits in front of the BFF service. APIM handles TLS termination, JWT validation at the network boundary, rate limiting, and request logging. The BFF handles everything downstream of that: Feide session management, upstream service aggregation, and Vue-specific response shaping. Neither layer does the other's job.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/b36f0862-6706-489d-ba32-c31491c41bb4.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>GraphQL</h2>
<h3>What it is designed to do</h3>
<p>GraphQL is a query language and runtime that lets clients declare exactly what data they need in a single request. Rather than multiple endpoints returning fixed shapes, GraphQL exposes a single endpoint backed by a typed schema. The client sends a query document specifying which fields it needs; the server resolves the query by calling whatever data sources are necessary and returns exactly the requested fields.</p>
<p>In a microservices context, this typically means a GraphQL server (or federation layer) that stitches together schemas from multiple services and resolves queries by delegating to the appropriate service.</p>
<h3>Where it wins</h3>
<p><strong>Client-driven data fetching.</strong> The defining advantage: the client gets exactly what it asks for, nothing more and nothing less. Overfetching and underfetching are structurally eliminated — or at least structurally addressable. A product team that iterates quickly on screens, changing which fields a component needs from sprint to sprint, does not need to modify a backend service or a BFF to change its data requirements. It changes the query.</p>
<p><strong>Unified schema across services.</strong> A GraphQL federation layer provides a single, coherent graph that spans multiple backend services. From the client's perspective, there is one API. The fact that <code>User</code> comes from the identity service, <code>Course</code> comes from the course service, and <code>Session</code> comes from the scheduling service is invisible. This is genuinely powerful in systems with many services and complex cross-entity queries.</p>
<p><strong>Introspection and tooling.</strong> GraphQL's introspection system means the schema is self-documenting and tooling (GraphiQL, Rover, generated TypeScript types via <code>graphql-codegen</code>) works out of the box. The development experience for querying data is hard to match.</p>
<p><strong>Incremental adoption.</strong> A GraphQL schema can be introduced in front of existing REST services without replacing them. The GraphQL resolvers call the REST endpoints. This makes adoption less disruptive than a full architectural change.</p>
<h3>Where it falls over</h3>
<p><strong>It does not eliminate the aggregation problem — it moves it.</strong> A GraphQL server still has to call multiple upstream services to resolve a query. The N+1 problem — where resolving a list of N items triggers N additional queries — is one of the most common production performance issues in GraphQL systems, and solving it requires the DataLoader pattern, batching strategies, and careful resolver design. This complexity is real and non-trivial.</p>
<p><strong>Caching becomes harder.</strong> REST's GET semantics map naturally to HTTP caching. GraphQL queries are typically POST requests with variable query documents, which HTTP caches cannot cache at the network layer. Caching in GraphQL requires application-level cache implementations (persisted queries, response caching with cache hints, Apollo Cache), all of which add complexity. For systems where caching is a primary performance lever, this is a significant cost.</p>
<p><strong>Security surface is wider and less obvious.</strong> A single endpoint that accepts arbitrary query documents is a different security model from discrete REST endpoints with defined inputs. Query depth limiting, query complexity analysis, and field-level authorisation all require explicit implementation. A naive GraphQL deployment is vulnerable to expensive query attacks that a REST API with fixed response shapes is not.</p>
<p><strong>It is best owned by a team that knows it well.</strong> GraphQL federation, resolver design, DataLoader implementation, and schema governance are non-trivial specialisms. Teams adopting GraphQL without prior experience tend to underestimate the ongoing maintenance cost of schema evolution, breaking change management, and performance debugging. The tooling is excellent, but the learning curve is real.</p>
<p><strong>Client-specified queries are a double-edged sword.</strong> The ability for clients to request arbitrary field combinations means the server cannot pre-optimise data fetching for known query patterns. A BFF, by contrast, knows exactly what every screen needs and can fetch exactly that — predictably, efficiently, with a query plan that never changes.</p>
<h3>When GraphQL is the right choice over BFF</h3>
<p>GraphQL genuinely outperforms BFF in two scenarios.</p>
<p>First, when the number of screens is large and they share overlapping data needs in complex, variable combinations. A content platform with hundreds of screen types, or a developer-facing API with many unknown consumers, benefits from GraphQL's flexibility in ways a BFF — designed for a finite set of known screens — does not.</p>
<p>Second, when product iteration velocity is high enough that the cost of updating a BFF endpoint for every UI change becomes a bottleneck. If engineers are changing which fields a component displays every sprint, and a BFF change is required for each change, the BFF is adding friction without adding enough stability to justify it. GraphQL's client-driven model removes that friction.</p>
<p>For the production education platform this series describes — a finite set of known screens, authenticated users with role-based data access, and specific caching requirements for institutional data — BFF was the right call. The screens were well-defined, the data relationships were consistent, and the security model (Feide token exchange, server-side sessions) required server-side ownership of the authentication boundary. GraphQL would have added schema complexity without providing flexibility the product needed.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/b89edcc0-f0c3-49d3-8ad0-2981d3e1809d.png" alt="" style="display:block;margin-left:auto" />

<hr />
<h2>Comparison at a glance</h2>
<table>
<thead>
<tr>
<th>Concern</th>
<th>API Gateway</th>
<th>GraphQL</th>
<th>BFF</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Owns aggregation</strong></td>
<td>No</td>
<td>Yes (resolver layer)</td>
<td>Yes</td>
</tr>
<tr>
<td><strong>Client-specific shaping</strong></td>
<td>No</td>
<td>Partial (field selection)</td>
<td>Yes (by design)</td>
</tr>
<tr>
<td><strong>Caching (HTTP-level)</strong></td>
<td>Yes</td>
<td>Difficult</td>
<td>Yes</td>
</tr>
<tr>
<td><strong>Security boundary</strong></td>
<td>Perimeter only</td>
<td>Application-level</td>
<td>Full (session + token)</td>
</tr>
<tr>
<td><strong>Frontend team ownership</strong></td>
<td>No</td>
<td>Shared</td>
<td>Yes</td>
</tr>
<tr>
<td><strong>Schema/type safety</strong></td>
<td>Via spec only</td>
<td>Native</td>
<td>Via OpenAPI</td>
</tr>
<tr>
<td><strong>N+1 risk</strong></td>
<td>N/A</td>
<td>High without DataLoader</td>
<td>Low (controlled fetching)</td>
</tr>
<tr>
<td><strong>Learning curve</strong></td>
<td>Low–Medium</td>
<td>High</td>
<td>Medium</td>
</tr>
<tr>
<td><strong>Breaking change risk</strong></td>
<td>Low</td>
<td>Medium (schema evolution)</td>
<td>Medium (contract versioning)</td>
</tr>
<tr>
<td><strong>Best at</strong></td>
<td>Cross-cutting concerns</td>
<td>Flexible client queries</td>
<td>Known client, stable screens</td>
</tr>
</tbody></table>
<hr />
<h2>Coexistence: the architecture that actually works in production</h2>
<p>The question "BFF vs API Gateway vs GraphQL" is framed as a choice. In practice, production systems at meaningful scale use more than one of these, because they solve different problems.</p>
<p>The most common and defensible combination is:</p>
<pre><code class="language-plaintext">Client
  │
  ▼
API Gateway  ←── TLS, auth enforcement, rate limiting, logging
  │
  ▼
BFF Service  ←── Aggregation, shaping, session management
  │    │    │
  ▼    ▼    ▼
Upstream services
</code></pre>
<p>The Gateway provides network-level security and traffic management. The BFF provides client-specific API logic. Each layer does exactly one thing and does not bleed into the other's responsibilities.</p>
<p>Adding GraphQL into this picture is less common but not unusual, typically in one of two configurations:</p>
<p><strong>GraphQL as the BFF's internal query mechanism.</strong> Rather than the BFF making REST calls to upstream services, it queries a GraphQL federation layer internally. The Vue application still talks REST to the BFF — it is shielded from GraphQL entirely. This gives the BFF the flexibility of GraphQL's data graph internally while maintaining a stable, cacheable REST contract externally. This configuration makes sense when upstream services are already federated under GraphQL and the BFF is being added in front of an existing system.</p>
<p><strong>GraphQL exposed alongside the BFF for different consumers.</strong> The BFF serves the primary Vue web application. A separate GraphQL endpoint serves mobile applications or third-party developers who benefit from flexible querying. Both sit behind the same Gateway. This is the "one BFF per client surface" model applied to heterogeneous client types — web gets a BFF optimised for its screens, mobile and external consumers get GraphQL's flexibility.</p>
<p>What does not work is all three patterns collapsing into a single layer with no clear ownership boundaries. A GraphQL server that also enforces rate limits that also does session management is a system where no one understands why anything is configured the way it is, and where changes in one concern break another.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/ae4a615d-51fb-4dea-901d-937b9de096ff.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Making the call</h2>
<p>The choice comes down to three questions, asked honestly:</p>
<p><strong>Who owns the API layer?</strong> If the answer is "a platform team," Gateway-first is the natural model. If the answer is "the frontend team," BFF gives that team the autonomy its ownership implies. If the answer is "unclear," that ambiguity should be resolved before the architecture decision.</p>
<p><strong>How well-defined are the client's data needs?</strong> Known screens with stable data requirements favour BFF — the predictability is a feature, not a limitation. Variable, exploratory, or numerous distinct query patterns favour GraphQL. Infrastructure-layer concerns with no client-specific logic favour Gateway only.</p>
<p><strong>What is the team's operational capacity?</strong> GraphQL federation is not simple to operate. BFF adds a service to maintain. Gateway-only is the lowest operational floor. Be honest about what your team can sustain, and choose an architecture whose operational cost falls within that ceiling.</p>
<p>For the real production web app used as the model this series — a single Vue web application, authenticated via Feide, deployed to Azure, serving a defined set of screens for a specific user population — BFF behind an Azure API Management Gateway is the right architecture. The remaining articles build it out in full.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li>→ <a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Designing the BFF Contract: Request Aggregation & Client-Specific Shaping]]></title><description><![CDATA[In the previous article, we established when a BFF earns its overhead. This article assumes you have made that decision and are now facing the harder question: how do you actually design the thing wel]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/designing-the-bff-contract-request-aggregation-client-specific-shaping</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/designing-the-bff-contract-request-aggregation-client-specific-shaping</guid><category><![CDATA[bff]]></category><category><![CDATA[Backend for frontend]]></category><category><![CDATA[API Design]]></category><category><![CDATA[versioning]]></category><category><![CDATA[API contract]]></category><category><![CDATA[FrontendArchitecture]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[request batching]]></category><category><![CDATA[caching]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[composition]]></category><category><![CDATA[backend orchestration]]></category><category><![CDATA[payload design]]></category><category><![CDATA[contract-testing]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Wed, 08 Apr 2026 13:11:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/b2605984-0b8c-4f68-9c33-64f12da7612b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous article, we established when a BFF earns its overhead. This article assumes you have made that decision and are now facing the harder question: how do you actually design the thing well?</p>
<p>The BFF contract — the API surface your application depends on — is the most consequential design decision in this architecture. Get it right and you have a clean, stable interface that lets the frontend move independently of upstream service changes. Get it wrong and you have a new layer that amplifies the coupling problems you were trying to solve, with the added pleasure of owning the infrastructure.</p>
<p>This article covers the four design concerns that determine which outcome you get: what aggregation actually means in practice, how to shape responses for the client rather than the domain, how to version the contract without recreating a backend team's problems, and where the boundary between BFF logic and upstream logic must be drawn and defended.</p>
<p>The examples and architecture decisions throughout are drawn from a production implementation built for a Norwegian enterprise in the education sector. Where the original system cannot be described in full, concepts have been generalised to meet NDA obligations — but the engineering trade-offs, the failure modes, and the decisions that shaped the final design are real. We use VueJS and .NET Core as the example frameworks since it's based on the real production project.</p>
<hr />
<h2>Aggregation is not just parallel fetching</h2>
<p>The word "aggregation" in BFF descriptions usually conjures an image of parallel HTTP calls fanning out to multiple services and the results being merged before the response is returned. That is part of it. But treating aggregation as a mechanical fan-out pattern misses what makes it valuable — and what makes it dangerous.</p>
<p>Consider a dashboard screen in an education platform. The frontend needs: the authenticated user's profile and role, the list of courses they are enrolled in, the next three upcoming sessions, and a count of unread notifications. In a naïve aggregation implementation, the BFF fires four requests in parallel, waits for all four, and concatenates the results into a response object.</p>
<p>This works. It is also fragile in a way that becomes apparent under load.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/a436198c-0ec2-49f3-b8fb-abf22f5b0df6.png" alt="" style="display:block;margin:0 auto" />

<p>A more considered design asks: what are the actual dependency relationships between these data sets? The course list depends on the user's organisation ID, which comes from the user profile. The session list is scoped to the course IDs returned by the course list. The notification count is independent. This is not a fan-out — it is a directed graph with two sequential phases:</p>
<pre><code class="language-plaintext">Phase 1 (parallel):
  → GET /identity/users/{id}         → profile + orgId
  → GET /notifications/unread/count  → notificationCount

Phase 2 (parallel, depends on Phase 1):
  → GET /courses?orgId={orgId}       → courseIds
  
Phase 3 (depends on Phase 2):
  → GET /sessions?courseIds={...}&amp;limit=3 → upcomingSessions
</code></pre>
<p>Treating this as a flat parallel fan-out would require either fetching all courses before knowing the org ID (not possible), or making the frontend responsible for the sequencing (which defeats the point). The BFF owns this orchestration — it understands the dependency graph and executes it efficiently, shielding the frontend from the fact that it exists.</p>
<p>This has a practical implication for implementation. In .NET Core, <code>Task.WhenAll</code> handles the parallel phases, and the sequential phases chain naturally:</p>
<pre><code class="language-csharp">// Phase 1: independent fetches in parallel
var (profile, notificationCount) = await (
    _userService.GetProfileAsync(userId),
    _notificationService.GetUnreadCountAsync(userId)
).WhenBoth();

// Phase 2: depends on profile
var courses = await _courseService.GetByOrgAsync(profile.OrgId);

// Phase 3: depends on courses
var courseIds = courses.Select(c =&gt; c.Id).ToArray();
var upcomingSessions = await _sessionService.GetUpcomingAsync(courseIds, limit: 3);
</code></pre>
<p>The aggregation logic lives in the BFF. The frontend makes one request and receives one coherent response. The dependency graph is invisible to the client.</p>
<h3>When aggregation goes wrong</h3>
<p>Two aggregation anti-patterns appear consistently in production BFFs.</p>
<p><strong>The "God endpoint."</strong> A single endpoint that returns everything the application might ever need, used on multiple screens because it is convenient. The God endpoint inflates payload sizes, makes partial failure handling impossible, and couples unrelated features together. If a notification service outage should not take down the course list, they must not share an endpoint. Design endpoints around screen-level data contracts, not around service boundaries.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/4c50fe4c-236c-4fcc-992e-8b6c955b5090.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Cascading failure without isolation.</strong> If Phase 2 in the example above fails because the course service is down, a poorly designed BFF either returns a 500 (crashing the whole screen) or silently swallows the error (showing stale or empty data with no indication of the problem). The correct design is explicit partial failure handling: return what succeeded, mark what failed, and let the frontend decide how to render a degraded state.</p>
<pre><code class="language-csharp">public record DashboardResponse(
    UserProfile Profile,
    IReadOnlyList&lt;Course&gt; Courses,
    IReadOnlyList&lt;Session&gt; UpcomingSessions,
    int NotificationCount,
    IReadOnlyList&lt;string&gt; PartialFailures  // e.g. ["courses", "sessions"]
);
</code></pre>
<p>This is not error handling for its own sake. It is a deliberate contract: the BFF guarantees it will always return a structurally valid response, and the Vue component decides what to render when parts of it are empty.</p>
<hr />
<h2>Response shaping: the frontend owns the shape</h2>
<p>The single most impactful thing a BFF does is decouple the response shape from the domain model. This is also where engineers most frequently make the wrong call.</p>
<p>The wrong call is to return the upstream service's response more or less as-is, perhaps with light field selection. This is understandable — it is the path of least resistance, it keeps the BFF thin, and it avoids the question of what the "right" shape is. But it is not response shaping. It is proxying, and a proxy does not justify the overhead of a dedicated service.</p>
<p>Response shaping means designing the response around the component tree that will consume it. The question is not "what did the upstream service return?" but "what does the Vue component need, and in what shape does it need it?"</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/8bbb449d-0544-4a4f-9050-d37beda782b8.png" alt="" style="display:block;margin:0 auto" />

<h3>Flatten, don't nest</h3>
<p>Domain models are frequently deeply nested because they reflect real-world entity relationships. Frontend components rarely need that nesting — they need flat data they can bind to template properties without traversal logic in the component.</p>
<p>An upstream <code>Course</code> entity might look like this:</p>
<pre><code class="language-json">{
  "id": "c-1",
  "metadata": {
    "title": "Mathematics — Year 9",
    "code": "MATH-9",
    "curriculum": {
      "framework": "NOR-K20",
      "subject": "Mathematics",
      "level": { "grade": 9, "label": "Year 9" }
    }
  },
  "enrollment": {
    "capacity": 30,
    "enrolled": 24,
    "waitlist": 2
  },
  "status": { "code": "ACTIVE", "since": "2024-08-15T00:00:00Z" }
}
</code></pre>
<p>A course card component in Vue needs: a title, a code, the enrollment fraction, and a status label. The BFF shapes this into:</p>
<pre><code class="language-json">{
  "id": "c-1",
  "title": "Mathematics — Year 9",
  "code": "MATH-9",
  "enrollmentLabel": "24 / 30",
  "enrollmentPercent": 80,
  "status": "Active",
  "activeFrom": "2024-08-15"
}
</code></pre>
<p>The component receives exactly what it renders. No traversal logic, no null-guard chains, no formatting in the template. The formatting decisions — how to display the enrollment fraction, how to present the date — are made once, in the BFF, and are consistent across every component that uses this data.</p>
<h3>Computed fields belong in the BFF</h3>
<p>The enrollment label and enrollment percentage in the example above are computed fields — they do not exist in the upstream response and must be derived. They belong in the BFF, not in the component.</p>
<p>The underlying principle: any derivation that is deterministic, presentation-oriented, and would be repeated across multiple components is a BFF concern. This includes percentage calculations, label generation, date formatting, status code translation, and currency formatting with locale awareness.</p>
<p>What does not belong in the BFF: business logic that should live upstream, validation that changes application state, or calculations that depend on runtime user input. The BFF is a rendering layer for data that is already computed — it is not a domain service.</p>
<h3>Naming conventions are a contract decision</h3>
<p>The upstream service uses <code>enrollmentCapacity</code> and <code>enrolledCount</code>. The BFF exposes <code>enrollmentLabel</code> and <code>enrollmentPercent</code>. The Vue component uses <code>enrollmentLabel</code> and <code>enrollmentPercent</code>.</p>
<p>This means your BFF's property names are part of its contract with the frontend. Changing <code>enrollmentLabel</code> to <code>enrollmentText</code> is a breaking change, even if the value is identical. Name properties for their rendering purpose, not their origin, and treat them with the same stability you would expect from any API you consume.</p>
<p>In practice, this argues for establishing naming conventions before writing the first endpoint and enforcing them through code review. Consistency in the contract reduces cognitive load on both sides of it.</p>
<hr />
<h2>Versioning: the problem you are creating</h2>
<p>A BFF contract is an API. APIs have versions, and versions accumulate. This is the most underspecified aspect of BFF design in most articles, because it is uncomfortable to discuss — you are creating a versioning problem in order to solve a coupling problem, and the question is whether the trade is favourable.</p>
<p>There are three approaches, each with a clear use case.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/1a3853da-cc01-4462-ad79-0c99e69d463b.png" alt="" style="display:block;margin:0 auto" />

<h3>URL versioning for major breaking changes</h3>
<pre><code class="language-plaintext">GET /api/v1/dashboard
GET /api/v2/dashboard
</code></pre>
<p>URL versioning is the most visible, most explicit, and most operationally expensive approach. It is appropriate when a response shape change is so significant that the old and new shapes cannot coexist under one contract — for example, when a screen is redesigned and the data model changes completely.</p>
<p>The operational cost is that both versions must be maintained simultaneously during the migration period, and the migration period in practice extends far longer than anticipated. Budget for it.</p>
<h3>Header versioning for incremental evolution</h3>
<pre><code class="language-plaintext">GET /api/dashboard
Accept: application/vnd.bff.dashboard+json; version=2
</code></pre>
<p>Header versioning keeps URLs stable and moves the version negotiation into headers. It is cleaner for incremental evolution — adding fields, changing response structure within a stable conceptual model — and it does not require duplicating route definitions. The cost is that it is less discoverable and requires slightly more discipline in the client to set the header correctly.</p>
<h3>Additive-only evolution: the best versioning strategy</h3>
<p>The most effective versioning strategy is one you do not need. If the BFF contract evolves additively — new fields are added, existing fields are never removed, semantics never change — versioning becomes a maintenance concern rather than a migration project.</p>
<p>This is achievable in practice with two rules:</p>
<p><strong>Never remove a field.</strong> If a field is no longer needed by the frontend, mark it as deprecated in internal documentation and stop populating it (return null), but leave it in the response schema. Removal is a v2 concern.</p>
<p><strong>Never change the semantics of an existing field.</strong> If <code>status</code> currently returns <code>"Active"</code> and you need it to return a structured object, that is a new field — <code>statusDetail</code> — not a change to <code>status</code>. The original field continues as-is.</p>
<p>Additive-only evolution is not infinitely sustainable — response shapes accumulate cruft over time — but it defers versioning costs to the natural cadence of major releases rather than introducing them with every sprint.</p>
<hr />
<h2>The boundary: what belongs in the BFF</h2>
<p>This is the question that determines whether your BFF stays healthy or becomes the new monolith. The answer is a firm principle rather than a checklist: <strong>the BFF transforms and aggregates; it does not originate.</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/4ca9da4f-2747-44f7-b4ce-579f1ff79c2d.png" alt="" style="display:block;margin:0 auto" />

<p>What this means in concrete terms:</p>
<h3>BFF owns</h3>
<ul>
<li><p><strong>Response aggregation</strong>: combining data from multiple upstream services into a single response shaped for the Vue component</p>
</li>
<li><p><strong>Field selection and projection</strong>: choosing which upstream fields to include and which to omit</p>
</li>
<li><p><strong>Presentation-layer computation</strong>: formatting, label generation, percentage derivation, date localisation</p>
</li>
<li><p><strong>Authentication enforcement</strong>: validating the session, exchanging tokens, enforcing access before any upstream call is made</p>
</li>
<li><p><strong>Caching of presentation-layer data</strong>: caching aggregated, shaped responses where staleness is acceptable</p>
</li>
<li><p><strong>Error translation</strong>: converting upstream error codes into client-meaningful error shapes</p>
</li>
</ul>
<h3>Upstream services own</h3>
<ul>
<li><p><strong>Business rules</strong>: what constitutes a valid enrollment, whether a course is at capacity, eligibility logic</p>
</li>
<li><p><strong>Domain validation</strong>: ensuring data integrity constraints are enforced at the source of truth</p>
</li>
<li><p><strong>State mutation</strong>: creating, updating, and deleting domain entities</p>
</li>
<li><p><strong>Cross-entity consistency</strong>: ensuring that a session cannot reference a deleted course</p>
</li>
</ul>
<h3>The grey area: where teams disagree</h3>
<p>The genuinely contested cases are usually one of two types:</p>
<p><strong>Computed fields that require domain knowledge.</strong> Is <code>isEnrollmentOpen</code> — a boolean derived from enrollment capacity and a business rule about the waitlist threshold — a presentation concern or a domain concern? The answer: if the rule could change (and business rules do change), it belongs upstream. The BFF should receive a pre-computed <code>enrollmentStatus</code> from the course service, not derive it locally. A BFF that embeds business rules is a BFF that becomes inconsistent with the backend when those rules change.</p>
<p><strong>Input validation on write endpoints.</strong> The BFF handles write operations too — course enrollments, session registrations. Validation that is purely structural (is this field present, is this ID a valid UUID format) is reasonable in the BFF as a fast-fail before the upstream call. Validation that is semantic (is this user eligible to enroll in this course) must happen upstream. Drawing the line here prevents a situation where the BFF and the upstream service have conflicting validation logic.</p>
<p>A practical heuristic: if a product manager could change the rule in a sprint, it belongs upstream. If it is structural and invariant (a user ID is always a UUID), it is acceptable in the BFF.</p>
<hr />
<h2>Designing for the Vue component tree</h2>
<p>The most useful frame for BFF endpoint design is not "what data does this screen need?" but "what does the Vue component tree look like, and what does each component expect from its props?".</p>
<p>A screen is composed of components. Each component has a data contract — its props interface. The BFF response should map cleanly onto that props interface, ideally such that a single destructure in the composable produces the data each component needs without further transformation.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/017c6bd1-7520-4a28-afdb-246a1614a825.png" alt="" style="display:block;margin:0 auto" />

<p>In practice this means walking the component tree before designing the endpoint:</p>
<pre><code class="language-plaintext">DashboardView
├── UserProfileCard        → { displayName, role, avatarUrl }
├── CourseListPanel
│   └── CourseCard (×n)   → { id, title, code, enrollmentLabel, status }
├── UpcomingSessionsList
│   └── SessionItem (×n)  → { id, title, startsAt, courseTitle, locationLabel }
└── NotificationBadge      → { count }
</code></pre>
<p>The BFF endpoint for this screen returns an object that mirrors this structure:</p>
<pre><code class="language-json">{
  "user": {
    "displayName": "Ingrid Solberg",
    "role": "Teacher",
    "avatarUrl": "/avatars/i-solberg.jpg"
  },
  "courses": [
    { "id": "c-1", "title": "Mathematics — Year 9", "code": "MATH-9",
      "enrollmentLabel": "24 / 30", "status": "Active" }
  ],
  "upcomingSessions": [
    { "id": "s-1", "title": "Integration review", "startsAt": "2025-04-08T09:00:00",
      "courseTitle": "Mathematics — Year 9", "locationLabel": "Room 204" }
  ],
  "notifications": { "count": 3 }
}
</code></pre>
<p>The Vue composable for this screen receives this response and distributes it to components without further transformation:</p>
<pre><code class="language-typescript">const { data } = useDashboard()

// Each ref maps directly to a component's props
const user = computed(() =&gt; data.value?.user)
const courses = computed(() =&gt; data.value?.courses ?? [])
const upcomingSessions = computed(() =&gt; data.value?.upcomingSessions ?? [])
const notificationCount = computed(() =&gt; data.value?.notifications.count ?? 0)
</code></pre>
<p>No adapter logic. No field mapping. The BFF contract and the component props interface are aligned by design.</p>
<hr />
<h2>A note on OpenAPI and type safety</h2>
<p>A BFF contract without a schema is a verbal agreement. It will drift. The response shape the BFF returns today will not match what the Vue components expect in three months, and you will discover the mismatch at runtime.</p>
<p>The mitigation is simple and should be non-negotiable: define the BFF contract with OpenAPI, generate TypeScript types from it, and import those types into the Vue application. Changes to the BFF response shape become compile-time errors in the frontend before they reach the browser.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/398a0648-d484-450f-81a6-10342b5dc97c.png" alt="" style="display:block;margin:0 auto" />

<p>In .NET Core, Swashbuckle generates an OpenAPI spec from controller or Minimal API route definitions automatically. In Vue 3, <code>openapi-typescript</code> or <code>@hey-api/openapi-ts</code> generates typed interfaces from that spec. The generation step belongs in the build pipeline, not as a manual step.</p>
<p>This is not optional complexity. A BFF that lacks type-safe contracts between its two surfaces — the upstream services and the Vue client — is a BFF that will break silently in production.</p>
<hr />
<h2>What comes next</h2>
<p>This article has covered the design principles that govern a well-structured BFF contract. The next article steps back before the implementation begins to address the comparative question that should be answered before writing any code: how does BFF compare to API Gateway and GraphQL as architectural options, where does each pattern win, and where do they coexist?</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li>→ <a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[What Is BFF — and When Is It Actually Worth It?]]></title><description><![CDATA[Your frontend has outgrown the API it was given.

At some point, most frontend teams hit the same wall. The backend exposes what it knows — resources, entities, service boundaries — and the frontend i]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/what-is-bff-and-when-is-it-actually-worth-it</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/what-is-bff-and-when-is-it-actually-worth-it</guid><category><![CDATA[Backend for frontend]]></category><category><![CDATA[BFF Pattern]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[API Design]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[FrontendArchitecture]]></category><category><![CDATA[Microservices]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 22 Mar 2026 04:22:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/71a91b5f-2c7d-46c1-9d6f-65c1665521fd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Your frontend has outgrown the API it was given.</p>
</blockquote>
<p>At some point, most frontend teams hit the same wall. The backend exposes what it knows — resources, entities, service boundaries — and the frontend is left stitching four API calls into a single screen, massaging data shapes the UI never asked for, and writing adapter logic that has no good place to live. The Backend for Frontend pattern is the answer to that wall. But it comes with a cost: an additional service to build, deploy, and own. This series makes the case for that trade-off — and equally, for the cases where it is not worth making. The examples and architecture decisions throughout are drawn from a production implementation built for a Norwegian enterprise in the education sector. Where the original system cannot be described in full, concepts have been generalised to meet NDA obligations — but the engineering trade-offs, the failure modes, and the decisions that shaped the final design are real.</p>
<hr />
<h2>The problem, stated plainly</h2>
<p>Before defining what a BFF is, it is worth being precise about what problem actually warrants one.</p>
<p>Imagine a dashboard screen in an education platform. It needs to render: the current user's profile and role, their organisation's enrolled courses, upcoming sessions for the current week, and unread notifications. In a system where services are organised around domain entities, that screen requires calls to at least four separate endpoints — likely across two or three different services. The responses come back in shapes optimised for storage and domain logic, not for what this particular screen needs.</p>
<p>The frontend handles it. It fires the requests, waits for them to resolve, merges the data, filters out the fields it does not need, transforms date formats, normalises inconsistent ID conventions between services, and then renders. This works. It is also a slow, fragile, and increasingly expensive pattern at scale.</p>
<p>Three specific failure modes appear consistently once systems grow:</p>
<p><strong>Overfetching and underfetching.</strong> REST endpoints designed around domain entities return either too much or too little for any given screen. A <code>GET /users/{id}</code> response that includes billing history, audit logs, and security settings satisfies the account settings page — but it is wasteful when all the dashboard needs is a display name and an avatar URL. Conversely, a screen requiring data from three different resource types must make three round trips, each adding latency, each adding a potential failure point.</p>
<p><strong>Adapter logic with no home.</strong> The gap between what upstream services return and what the frontend needs gets filled somewhere. In the absence of a dedicated layer, it lands in the frontend itself — in Vuex stores, in composables, in utility functions scattered across the codebase. This logic is hard to test in isolation, invisible to the backend teams who changed the API shape that broke it, and re-implemented separately for each client surface (web app, mobile app, third-party integration).</p>
<p><strong>Security and session complexity pushed to the client.</strong> When the frontend talks directly to multiple APIs, it must manage tokens for each of them, handle token refresh across parallel requests, and decide what to expose in the browser. This is not where security boundaries should be drawn. The browser is not a trusted environment, and treating it as one creates problems that are difficult to retrofit away later.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/b624a4c0-e27f-4c99-9675-8ec24d39f340.png" alt="" style="display:block;margin:0 auto" />

<p>None of these problems are fatal on their own, early in a product's life. The question is what you reach for when they compound — and BFF is one answer, not the only one.</p>
<hr />
<h2>What BFF actually is</h2>
<p>The Backend for Frontend pattern, first articulated by Sam Newman in the context of microservices architecture, is straightforward in principle: create a dedicated backend service for each distinct frontend client, owned by the frontend team, whose sole responsibility is serving that client's specific needs.</p>
<p>The key words are <em>dedicated</em> and <em>owned by the frontend team</em>. A BFF is not a general-purpose API gateway. It is not a shared middleware layer. It is a service that knows exactly one consumer — your frontend — and is optimised entirely for that consumer's needs. It aggregates calls to upstream services, shapes responses into exactly what the UI requires, handles authentication at the boundary, and shields the frontend from the complexity and instability of the services behind it.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/596e90a5-d6a5-47e1-b75c-530c7e420109.png" alt="" style="display:block;margin:0 auto" />

<p>The Vue application has one API contract to reason about: the BFF. It does not know or care how many upstream services exist, how they are versioned, or what their response shapes look like. That complexity lives in the BFF, where it can be tested, versioned, and changed independently.</p>
<p>The BFF can do several things the frontend cannot do cleanly on its own: it can fire multiple upstream requests in parallel and merge the results before responding; it can cache aggressively for data that does not change per-request; it can enforce authentication and authorisation before a single upstream call is made; and it can translate between authentication contexts — for example, exchanging a session cookie for a service-to-service token without ever exposing a bearer token to the browser.</p>
<hr />
<h2>What BFF actually costs</h2>
<p>This is where most introductory articles skip ahead too quickly. The BFF pattern is genuinely useful — but the version of it that works in production looks different from the version described in architecture blog posts, and the gap is filled with operational cost.</p>
<p><strong>You are taking on a new service.</strong> This sounds obvious but its implications are underestimated. A new service means a new deployment pipeline, a new container to monitor, a new set of logs to aggregate, a new failure mode to handle, a new component in your runbook. If your team does not already own infrastructure or has not previously maintained a backend service, the operational learning curve is real. The BFF will go down. It will have bugs. It will need updating when upstream services change their contracts. These are not hypothetical costs — they are the routine maintenance costs of any production service, and they do not disappear because the service is thin.</p>
<p><strong>The BFF becomes a coupling point.</strong> When your Vue application and your upstream services are decoupled by a BFF, the BFF is not free of coupling — it absorbs it. Every upstream API change that affects the frontend now requires a BFF change too. In a fast-moving system, this can mean the BFF becomes a bottleneck: a place where changes must land before they can reach the frontend. The team that owns the BFF becomes the team that must be unblocked first.</p>
<p><strong>Latency is not free.</strong> A BFF adds one network hop between the browser and its data. For most production deployments — where the BFF and its upstream services are colocated in the same cloud region — this hop is in the single-digit milliseconds. But it exists, and for systems already operating close to latency budgets, it matters. The mitigation is co-deployment discipline and caching, both of which require deliberate effort.</p>
<p><strong>The BFF can become a dumping ground.</strong> This is the failure mode no one talks about in architecture talks. A BFF that starts as a clean aggregation layer accumulates business logic over time. A validation rule here, a conditional transform there, a calculation that "just needs to live somewhere." Left unchecked, a BFF becomes a monolith with the word "frontend" in its name. The discipline to keep it thin — a translator and aggregator, not a domain engine — is cultural as much as technical, and it requires active enforcement.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/b477b917-d883-4589-81b5-4b4e4874b094.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>When BFF is worth it</h2>
<p>With that context established, the cases where BFF earns its overhead are clearer.</p>
<p><strong>Multiple upstream services, single UI surface.</strong> If your frontend needs to aggregate data from three or more independent services for routine screens, the aggregation cost is already being paid somewhere. Paying it in the BFF — where it can be tested, cached, and monitored — is better than paying it in the client or across a distributed chain of sequential API calls.</p>
<p><strong>Multiple client surfaces with diverging needs.</strong> A web application, a mobile app, and a third-party integration consume fundamentally different API shapes. A response payload appropriate for a desktop dashboard is wasteful over a mobile connection. A BFF per client surface means each client gets exactly what it needs, without the upstream services needing to know or care about client-specific requirements. This is the original use case Sam Newman described, and it remains the strongest one.</p>
<p><strong>Security boundary clarity.</strong> If your system involves tokens that must never reach the browser, or authentication flows that require server-side session management — as is the case with Feide, the Norwegian government identity provider used in this series — a BFF gives you a clean place to draw the security perimeter. The BFF holds the session, manages token exchange, and the browser only ever receives a cookie. This is the Token Handler pattern, and it is substantially harder to implement correctly without a dedicated server-side layer.</p>
<p><strong>Unstable upstream contracts.</strong> In a microservices environment where teams are moving fast and breaking things at the API layer, a BFF acts as a translation buffer. When an upstream service changes its response shape, you update the BFF. The Vue application is insulated. Without the BFF, that upstream change propagates directly into frontend code — often discovered at runtime rather than compile time.</p>
<p><strong>Team ownership alignment.</strong> Perhaps the least technical but most practically important factor: if the frontend team has the capacity to own a backend service, a BFF gives them the autonomy to move at their own pace without being blocked on backend teams for API shape changes. This is an organisational argument as much as an architectural one, and it should be evaluated as such.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/a17939cf-aaaa-4862-9249-846f4e6967b7.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>When BFF is not worth it</h2>
<p>This section is the one most articles omit. The BFF pattern has a real overhead floor that you pay regardless of system complexity. Below a certain threshold, that floor is higher than the problems it solves.</p>
<p><strong>Small teams moving fast on a single surface.</strong> If you have one frontend, one backend, and a team of three engineers, a BFF introduces a coordination overhead between your own people that does not exist if the frontend talks directly to the API. The aggregation and shaping problems are real, but they are solvable with thoughtful API design, GraphQL, or simply accepting a thin amount of adapter logic in the frontend until the system is large enough to justify more structure.</p>
<p><strong>A well-designed monolithic API.</strong> If your backend already returns response shapes close to what the frontend needs — because the backend team works closely with the frontend team, or because the API was designed frontend-first — a BFF adds a layer without adding meaningful value. The problem a BFF solves is the impedance mismatch between backend domain models and frontend presentation needs. If that mismatch is small, the solution is disproportionate.</p>
<p><strong>Early-stage products with unstable requirements.</strong> A BFF contract between the frontend and its upstream services is another API surface to maintain. In the early life of a product, when screen designs change weekly and domain models are still being discovered, the BFF becomes a change multiplier: every significant UI change requires a frontend change, a BFF change, and potentially an upstream change. The stability that makes a BFF valuable is the same stability that is absent in early-stage development.</p>
<p><strong>Teams without infrastructure ownership.</strong> If your team has never maintained a deployed service — never dealt with container health checks, never written a deployment pipeline, never handled a 3am incident for something they own — adopting a BFF in production is learning two hard things simultaneously: the architecture and the operations. This is not a reason to avoid BFF permanently, but it is a reason to be honest about timing and capacity.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/cd5d0812-03f7-4cc7-882e-6380318f4f3e.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>The decision framework</h2>
<p>Rather than a checklist, use three questions. If the answer to fewer than two is "yes," the BFF overhead is likely not justified at this stage.</p>
<p><strong>Does your frontend aggregate data from three or more independent services for routine operations?</strong> If most screens require only one or two API calls that already return the right shape, the aggregation value proposition is weak.</p>
<p><strong>Do you have a meaningful security or session management requirement that cannot be cleanly handled in the client?</strong> If your authentication flow is stateless, token-based, and entirely client-managed, the security argument for BFF does not apply. If you are dealing with server-side sessions, token exchange, or an identity provider like Feide that requires server-side handling, it does.</p>
<p><strong>Does your team have the capacity to own and operate a backend service independently?</strong> This means a deployment pipeline, monitoring, alerting, runbooks, and the willingness to be on-call for it. BFF without operational ownership is technical debt in a server rack.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/d9f56524-4617-4886-9d32-4aea50917708.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>What this series covers from here</h2>
<p>The rest of this series assumes the answer is "yes, BFF is the right call." If it is not — if you read this article and concluded that your system is not there yet — the single most useful thing you can do is bookmark the decision framework above and revisit it in six months. Architecture decisions should trail system complexity, not lead it.</p>
<p>For those continuing: the next article addresses how to design the BFF contract itself — what belongs inside it, what must stay in upstream services, and how to version the API surface you are creating without recreating the problems you were trying to solve.</p>
<p>The implementation articles that follow use .NET Core Minimal APIs for the BFF service, Vue 3 composables for the client-side API layer, Feide for authentication, and Azure Container Instances for deployment. Each article is self-contained, but the architecture decisions made in the early articles carry forward — so reading in order is the path of least resistance.</p>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li>→ <a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[Introduction to The Frontend's Contract: Building Backends for Frontends
with Vue.js, .NET Core & Azure]]></title><description><![CDATA[At some point, most frontend teams hit the same wall. The backend exposes what it knows — resources, entities, service boundaries — and the frontend is left stitching four API calls into a single scre]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure</guid><category><![CDATA[Backend for frontend]]></category><category><![CDATA[BFF Pattern]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[API Design]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[.net core]]></category><category><![CDATA[Azure]]></category><category><![CDATA[FrontendArchitecture]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sat, 21 Mar 2026 04:49:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6931380619a4497f08a1d0fd/e57052d8-245e-41b6-8a07-cc50d9c8b05a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At some point, most frontend teams hit the same wall. The backend exposes what it knows — resources, entities, service boundaries — and the frontend is left stitching four API calls into a single screen, massaging data shapes the UI never asked for, and writing adapter logic that has no good place to live. The Backend for Frontend pattern is the answer to that wall. But it comes with a cost: an additional service to build, deploy, and own. This series makes the case for that trade-off — and equally, for the cases where it is not worth making. The examples and architecture decisions throughout are drawn from a production implementation built for a Norwegian enterprise in the education sector. Where the original system cannot be described in full, concepts have been generalised to meet NDA obligations — but the engineering trade-offs, the failure modes, and the decisions that shaped the final design are real.</p>
<blockquote>
<p><em>"Every frontend eventually outgrows the API it was given. The BFF pattern is not about adding a layer — it's about taking ownership of the interface between your product and its backend, on your terms. This series is for engineers who want to understand the trade-offs before they commit to the architecture."</em></p>
</blockquote>
<h2>Series Guideline</h2>
<h3><strong>Part I: Foundations (Concepts and Architecture)</strong></h3>
<ul>
<li><p><strong>Article 1: What Is BFF — and When Is It Actually Worth It?</strong><br />The problem it solves, the cost it introduces, and the honest answer on when not to use it.</p>
</li>
<li><p><strong>Article 2: Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</strong><br />API design principles, response shaping, versioning strategy, and what belongs in the BFF vs upstream services.</p>
</li>
<li><p><strong>Article 3: BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</strong><br />Comparative analysis with real trade-offs. Where each pattern wins, where it falls over, and how they can coexist.</p>
</li>
</ul>
<h3><strong>Part II: Implementation (Code)</strong></h3>
<ul>
<li><p><strong>Article 4: Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</strong><br />Standing up the BFF service, aggregating upstream calls, shaping responses, and handling errors with real code.</p>
</li>
<li><p><strong>Article 5: The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</strong><br />Building a clean, typed client-BFF contract in Vue 3. useApi composables, error handling strategies, and OpenAPI codegen.</p>
</li>
<li><p><strong>Article 6: Auth at the Boundary: Integrating Feide Identity via the BFF</strong><br />Connecting the BFF to Feide — Norway's government-issued identity provider for educational organisations. OAuth 2.0 + OIDC flow, the Token Handler pattern, and why cookie-based sessions beat tokens in the browser.</p>
</li>
<li><p><strong>Article 7: Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</strong><br />Full IaaS deployment pipeline — building and tagging Docker images, publishing artifacts, and running the BFF on Azure Container Instances. Includes Azure Front Door routing and when API Management adds value vs noise.</p>
</li>
</ul>
<h3><strong>Part III: Production &amp; Operations (Ops)</strong></h3>
<ul>
<li><p><strong>Article 8: Testing the BFF: Unit, Integration &amp; Contract Tests</strong><br />A layered testing strategy for the BFF. WebApplicationFactory for integration tests, Pact for consumer-driven contract testing with Vue.</p>
</li>
<li><p><strong>Article 9: Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</strong><br />End-to-end traceability across Vue → BFF → upstream services using Azure Application Insights. Correlation IDs, structured logs with Serilog, custom telemetry, and Application Insights dashboards and alerts.</p>
</li>
</ul>
<h3><strong>Supplementary articles</strong></h3>
<ul>
<li><p><strong>Caching in the BFF: In-Memory, Redis &amp; Response Caching</strong><br />Where caching belongs in a BFF architecture, how to avoid stale-data bugs, and cache invalidation patterns.</p>
</li>
<li><p><strong>Brownfield Migration: The Strangler Fig Approach to BFF Adoption</strong><br />Incrementally introducing a BFF in front of an existing monolith or REST API without a big-bang rewrite.</p>
</li>
<li><p><strong>Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</strong><br />Making the BFF fault-tolerant using Polly. Handling partial upstream failures gracefully in aggregated responses.</p>
</li>
</ul>
<hr />
<h2>☰ Series navigation</h2>
<aside>
  → <a href="/introduction-to-the-frontend-s-contract-building-backends-for-frontends-with-vue-js-net-core-azure">Introduction</a>
  <div style="margin-top:18px;margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#1A6B4A,#34C88A);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part I — Foundations</span>
    </div>
    <ul>
      <li><a href="/what-is-bff-and-when-is-it-actually-worth-it">What Is BFF — and When Is It Actually Worth It?</a></li>
      <li><a href="/designing-the-bff-contract-request-aggregation-client-specific-shaping">Designing the BFF Contract: Request Aggregation &amp; Client-Specific Shaping</a></li>
      <li><a href="/bff-vs-api-gateway-vs-graphql-picking-the-right-abstraction">BFF vs API Gateway vs GraphQL: Picking the Right Abstraction</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#2D52A0,#6B9FEF);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part II — Implementation</span>
    </div>
    <ul>
      <li><a href="/building-the-bff-in-net-core-minimal-apis-routing-aggregation">Building the BFF in .NET Core: Minimal APIs, Routing &amp; Aggregation</a></li>
      <li><a href="/the-vue-3-api-layer-of-bff-composables-error-boundaries-type-safety">The Vue 3 API Layer: Composables, Error Boundaries &amp; Type Safety</a></li>
      <li><a href="/auth-at-the-boundary-integrating-feide-identity-via-the-bff">Auth at the Boundary: Integrating Feide Identity via the BFF</a></li>
      <li><a href="/shipping-bff-to-azure-docker-images-artifact-publishing-azure-container-instances">Shipping to Azure: Docker Images, Artifact Publishing &amp; Azure Container Instances</a></li>
    </ul>
  </div>
  <div style="margin-bottom:18px">
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#7C3E8F,#C084E8);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Part III — Production &amp; Operations</span>
    </div>
    <ul>
      <li><a href="/testing-the-bff-unit-integration-contract-tests">Testing the BFF: Unit, Integration &amp; Contract Tests</a></li>
      <li><a href="/observability-for-bff-structured-logging-distributed-tracing-azure-application-insights">Observability: Structured Logging, Distributed Tracing &amp; Azure Application Insights</a></li>
    </ul>
  </div>
  <div>
    <div style="display:flex;align-items:center;gap:7px;margin-bottom:6px">
      <span style="width:5px;height:5px;border-radius:50%;background:light-dark(#B45309,#F5A623);flex-shrink:0;display:inline-block"></span>
      <span style="font-size:10px;font-weight:500;letter-spacing:.1em;text-transform:uppercase;color:light-dark(#9E9B94,#5E5C56)">Supplementary</span>
    </div>
    <ul>
      <li><a href="/caching-in-the-bff-in-memory-redis-response-caching">Caching in the BFF: In-Memory, Redis &amp; Response Caching</a></li>
      <li><a href="/brownfield-migration-the-strangler-fig-approach-to-bff-adoption">Brownfield Migration: The Strangler Fig Approach to BFF Adoption</a></li>
      <li><a href="/bff-resilience-patterns-circuit-breakers-retries-timeouts-with-polly">Resilience Patterns: Circuit Breakers, Retries &amp; Timeouts with Polly</a></li>
    </ul>
  </div>
</aside>]]></content:encoded></item><item><title><![CDATA[How Browser UX Shapes Security More Than Cryptography]]></title><description><![CDATA[Cryptography is precise.
Browsers are not.
If you’ve implemented WebAuthn in a real PWA, you already know this:The spec is clean. The user experience is not.
The uncomfortable truth is this:

Most authentication systems fail because of UX, not becaus...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography</guid><category><![CDATA[#webauthn]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[Authentication UX]]></category><category><![CDATA[browser security]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[security architecture]]></category><category><![CDATA[user experience]]></category><category><![CDATA[identity design]]></category><category><![CDATA[Application Security]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Thu, 19 Feb 2026 07:43:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771475868373/db5ddead-4354-48f6-922c-8c315394778a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Cryptography is precise.</p>
<p>Browsers are not.</p>
<p>If you’ve implemented WebAuthn in a real PWA, you already know this:<br />The spec is clean. The user experience is not.</p>
<p>The uncomfortable truth is this:</p>
<blockquote>
<p>Most authentication systems fail because of UX, not because of broken cryptography.</p>
</blockquote>
<p>WebAuthn gives us origin binding, challenge–response, and public-key authentication. That’s beautiful. But what users actually interact with is:</p>
<ul>
<li><p>A browser modal.</p>
</li>
<li><p>An OS biometric sheet.</p>
</li>
<li><p>A permission dialog.</p>
</li>
<li><p>A vague error message.</p>
</li>
<li><p>A “NotAllowedError”.</p>
</li>
</ul>
<p>And those surfaces shape behavior more than any algorithm ever will.</p>
<p>Let’s examine how browser and OS UX decisions constrain authentication design — and why UX discipline is often more important than cryptographic strength.</p>
<hr />
<h1 id="heading-1-browser-and-os-ux-constrain-your-architecture">1. Browser and OS UX Constrain Your Architecture</h1>
<p>When you call:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">await</span> navigator.credentials.get({
  <span class="hljs-attr">publicKey</span>: options
});
</code></pre>
<p>You are not in control anymore.</p>
<p>The browser:</p>
<ul>
<li><p>Decides how the prompt looks.</p>
</li>
<li><p>Decides when it appears.</p>
</li>
<li><p>Decides how cancellation behaves.</p>
</li>
<li><p>Decides what error is returned.</p>
</li>
<li><p>Delegates to the OS for biometric UI.</p>
</li>
</ul>
<p>Your PWA is a spectator.</p>
<h2 id="heading-example-timing-assumptions">Example: Timing Assumptions</h2>
<p>You might assume:</p>
<ul>
<li><p>The WebAuthn prompt appears immediately.</p>
</li>
<li><p>The user understands what is happening.</p>
</li>
<li><p>Cancellation is intentional.</p>
</li>
</ul>
<p>In reality:</p>
<ul>
<li><p>On Chrome desktop, the modal may appear inline.</p>
</li>
<li><p>On Safari (macOS), Touch ID sheet drops from the top.</p>
</li>
<li><p>On iOS Safari, Face ID overlay obscures the entire screen.</p>
</li>
<li><p>On Android Chrome, the prompt may feel like a system dialog unrelated to your app.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771476177485/afcbe9b5-d2bd-42b5-8a1c-3aecb2bf67e6.png" alt class="image--center mx-auto" /></p>
<p>Your architecture must not depend on:</p>
<ul>
<li><p>Specific timing.</p>
</li>
<li><p>Specific modal appearance.</p>
</li>
<li><p>Immediate resolution.</p>
</li>
</ul>
<p>This is not a cosmetic issue. It affects retry logic and fallback strategy.</p>
<hr />
<h1 id="heading-2-the-same-webauthn-flow-feels-different-everywhere">2. The Same WebAuthn Flow Feels Different Everywhere</h1>
<p>The WebAuthn API is standardized.</p>
<p>The UX is not.</p>
<h3 id="heading-chrome-desktop">Chrome (Desktop)</h3>
<ul>
<li><p>Inline modal.</p>
</li>
<li><p>Clear “Use another device” option.</p>
</li>
<li><p>Relatively consistent error messaging.</p>
</li>
</ul>
<h3 id="heading-safari-macos">Safari (macOS)</h3>
<ul>
<li><p>OS-native Touch ID sheet.</p>
</li>
<li><p>Less explicit fallback controls.</p>
</li>
<li><p>Errors often appear as generic cancellation.</p>
</li>
</ul>
<h3 id="heading-ios-safari">iOS Safari</h3>
<ul>
<li><p>Full-screen Face ID overlay.</p>
</li>
<li><p>Sometimes minimal explanation.</p>
</li>
<li><p>Cancellation feels like app failure.</p>
</li>
</ul>
<h3 id="heading-android-chrome">Android Chrome</h3>
<ul>
<li><p>OS biometric dialog.</p>
</li>
<li><p>Slightly different copy.</p>
</li>
<li><p>Device PIN fallback flows vary by manufacturer.</p>
</li>
</ul>
<p>Your code may be identical:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">try</span> {
  <span class="hljs-keyword">const</span> assertion = <span class="hljs-keyword">await</span> navigator.credentials.get({ <span class="hljs-attr">publicKey</span>: options });
} <span class="hljs-keyword">catch</span> (err) {
  handleError(err);
}
</code></pre>
<p>But <code>err.name</code> and user interpretation differ.</p>
<h2 id="heading-real-example-cancellation-handling">Real Example: Cancellation Handling</h2>
<p>Common error:</p>
<pre><code class="lang-javascript">DOMException: NotAllowedError
</code></pre>
<p>This can mean:</p>
<ul>
<li><p>User cancelled.</p>
</li>
<li><p>Timeout expired.</p>
</li>
<li><p>Platform authenticator unavailable.</p>
</li>
<li><p>Permission denied.</p>
</li>
</ul>
<p>From your frontend perspective:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">catch</span> (err) {
  <span class="hljs-keyword">if</span> (err.name === <span class="hljs-string">"NotAllowedError"</span>) {
    showRetry();
  }
}
</code></pre>
<p>But retry logic must consider:</p>
<ul>
<li><p>Did the user intentionally cancel?</p>
</li>
<li><p>Did the biometric sensor fail?</p>
</li>
<li><p>Is WebAuthn unsupported?</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771476481925/2053d7e3-6743-4115-a620-d20e8ea41447.png" alt class="image--center mx-auto" /></p>
<p>If you misinterpret cancellation as attack — you create lockouts.</p>
<p>If you misinterpret failure as benign — you create confusion.</p>
<p>UX interpretation is part of your threat model.</p>
<hr />
<h1 id="heading-3-permission-dialogs-shape-security-outcomes">3. Permission Dialogs Shape Security Outcomes</h1>
<p>Consider initial WebAuthn registration:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">await</span> navigator.credentials.create({
  <span class="hljs-attr">publicKey</span>: options
});
</code></pre>
<p>Browser may ask:</p>
<ul>
<li><p>“Allow this site to use your security key?”</p>
</li>
<li><p>“Allow Touch ID for this site?”</p>
</li>
</ul>
<p>If your UI does not clearly prepare the user:</p>
<ul>
<li><p>The permission dialog feels suspicious.</p>
</li>
<li><p>The user cancels reflexively.</p>
</li>
<li><p>They choose fallback instead.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771476787979/7ff7727d-e82c-4305-924a-90d921722adf.png" alt class="image--center mx-auto" /></p>
<p>Repeated friction trains users to:</p>
<ul>
<li><p>Prefer weaker flows.</p>
</li>
<li><p>Avoid passwordless enrollment.</p>
</li>
</ul>
<p>Strong crypto loses to confusing UX.</p>
<hr />
<h1 id="heading-4-retry-flows-influence-security-behavior">4. Retry Flows Influence Security Behavior</h1>
<p>Imagine this frontend flow:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">startWebAuthn</span>(<span class="hljs-params">options</span>) </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> assertion = <span class="hljs-keyword">await</span> navigator.credentials.get({ <span class="hljs-attr">publicKey</span>: options });
    <span class="hljs-keyword">await</span> verify(assertion);
  } <span class="hljs-keyword">catch</span> (err) {
    showRetry();
  }
}
</code></pre>
<p>If “Retry” automatically triggers WebAuthn again without context, users may:</p>
<ul>
<li><p>Rapidly cancel.</p>
</li>
<li><p>Assume something is broken.</p>
</li>
<li><p>Switch to fallback.</p>
</li>
</ul>
<p>Instead, better UX:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">if</span> (err.name === <span class="hljs-string">"NotAllowedError"</span>) {
  showMessage(<span class="hljs-string">"Authentication was cancelled. Try again or use Feide login."</span>);
}
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771485578690/f7c3647b-9d46-48fe-a21c-bf24f6cb2ef4.png" alt class="image--center mx-auto" /></p>
<p>Explicit fallback messaging prevents:</p>
<ul>
<li><p>Panic.</p>
</li>
<li><p>Repeated failure loops.</p>
</li>
<li><p>Insecure workaround requests (“Can you disable this for me?”).</p>
</li>
</ul>
<p>Retries are not neutral. They shape behavior.</p>
<hr />
<h1 id="heading-5-browser-ux-affects-security-perception">5. Browser UX Affects Security Perception</h1>
<p>Security systems rely on trust perception.</p>
<p>If the browser modal:</p>
<ul>
<li><p>Looks native and familiar → user trusts it.</p>
</li>
<li><p>Looks alien or unexpected → user suspects phishing.</p>
</li>
</ul>
<p>That’s why WebAuthn is powerful:</p>
<p>Origin binding ensures the browser only shows credentials for the correct site.</p>
<p>But the user doesn’t see origin binding.<br />They see a modal.</p>
<p>Your UI must:</p>
<ul>
<li><p>Clearly explain what is about to happen.</p>
</li>
<li><p>Avoid surprising transitions.</p>
</li>
<li><p>Avoid triggering WebAuthn automatically without context.</p>
</li>
</ul>
<p>Example:</p>
<p>Instead of immediately calling WebAuthn:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">button</span> @<span class="hljs-attr">click</span>=<span class="hljs-string">"authenticate"</span>&gt;</span>
  Sign in with device
<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
</code></pre>
<p>Make the user initiate the action.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771485857210/70539f7f-1f24-42ce-ad68-bb80d93bb091.png" alt class="image--center mx-auto" /></p>
<p>User agency increases trust.</p>
<hr />
<h1 id="heading-6-why-good-ux-prevents-insecure-workarounds">6. Why Good UX Prevents Insecure Workarounds</h1>
<p>Users do not attack your system.</p>
<p>They bypass it.</p>
<p>If passwordless is confusing, they will:</p>
<ul>
<li><p>Ask support to disable it.</p>
</li>
<li><p>Request email-based fallback.</p>
</li>
<li><p>Demand “simpler login”.</p>
</li>
</ul>
<p>If fallback is weak, security erodes.</p>
<p>Good UX reduces these pressures.</p>
<p>Example: Clear device management UI.</p>
<p>Instead of hiding credentials:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> devices = <span class="hljs-keyword">await</span> _db.WebAuthnCredentials
    .Where(c =&gt; c.UserId == user.Id)
    .ToListAsync();
</code></pre>
<p>Expose:</p>
<ul>
<li><p>Device name</p>
</li>
<li><p>Registration date</p>
</li>
<li><p>Revoke button</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771486265089/4c8d3fe6-d3ec-45e6-b28b-695e1af58152.png" alt class="image--center mx-auto" /></p>
<p>Transparency builds confidence.</p>
<hr />
<h1 id="heading-7-browser-constraints-affect-architecture">7. Browser Constraints Affect Architecture</h1>
<p>You cannot:</p>
<ul>
<li><p>Customize biometric prompt text.</p>
</li>
<li><p>Force specific fallback options.</p>
</li>
<li><p>Guarantee consistent timing.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771486464371/489f72a7-1644-4569-a32e-9332430e7a3e.png" alt class="image--center mx-auto" /></p>
<p>Therefore, architecture must:</p>
<ul>
<li><p>Avoid assuming prompt content.</p>
</li>
<li><p>Avoid assuming immediate response.</p>
</li>
<li><p>Support retry and fallback cleanly.</p>
</li>
<li><p>Log error patterns per browser.</p>
</li>
</ul>
<p>Operationally, track:</p>
<ul>
<li><p>WebAuthn failures by user agent.</p>
</li>
<li><p>Cancellation frequency.</p>
</li>
<li><p>Fallback usage rates.</p>
</li>
</ul>
<p>UX metrics are security metrics.</p>
<hr />
<h1 id="heading-8-cryptography-vs-behavior">8. Cryptography vs Behavior</h1>
<p>WebAuthn’s cryptography is solid:</p>
<ul>
<li><p>Public key signatures.</p>
</li>
<li><p>Origin binding.</p>
</li>
<li><p>Replay protection.</p>
</li>
<li><p>Counter tracking.</p>
</li>
</ul>
<p>But if:</p>
<ul>
<li><p>Users disable it.</p>
</li>
<li><p>Enrollment fails.</p>
</li>
<li><p>Recovery is confusing.</p>
</li>
<li><p>Fallback is hidden.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771486979819/253d229d-a106-4919-b203-8743b4251bab.png" alt class="image--center mx-auto" /></p>
<p>Then strong algorithms lose to weak experience.</p>
<p>The most secure system is the one users willingly use.</p>
<hr />
<h1 id="heading-final-reflection">Final Reflection</h1>
<p>Security engineers love to debate:</p>
<ul>
<li><p>Key lengths.</p>
</li>
<li><p>Counter semantics.</p>
</li>
<li><p>Attestation policies.</p>
</li>
</ul>
<p>But in real deployments, the bigger questions are:</p>
<ul>
<li><p>Did the user understand what just happened?</p>
</li>
<li><p>Did the retry flow make sense?</p>
</li>
<li><p>Did cancellation feel safe?</p>
</li>
<li><p>Did fallback feel legitimate?</p>
</li>
<li><p>Did the browser modal align with user expectations?</p>
</li>
</ul>
<p>Browser UX is not decoration layered on top of cryptography.</p>
<p>It is the environment in which cryptography lives.</p>
<p>WebAuthn’s design is brilliant.</p>
<p>But the success of a passwordless-first PWA depends less on elliptic curves — and more on how gracefully your system handles human uncertainty.</p>
<p>Stronger algorithms improve theoretical security.</p>
<p>Clearer UX improves actual security.</p>
<p>And in production systems, actual security is the only kind that matters.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p>→ <strong>How Browser UX Shapes Security More Than Cryptography</strong></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Why Passwordless Alone Is Not an Identity Strategy]]></title><description><![CDATA[When teams adopt WebAuthn or FIDO2, the excitement is understandable:

No passwords.

No phishing.

No credential stuffing.

Biometric UX.

Public-key cryptography.


It feels like the final answer.
But WebAuthn answers exactly one question:

Can thi...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy</guid><category><![CDATA[Identity Strategy]]></category><category><![CDATA[Account Recovery]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[#webauthn]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[OpenID Connect]]></category><category><![CDATA[Federated Identity]]></category><category><![CDATA[Authentication Architecture]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[security architecture]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Thu, 19 Feb 2026 04:19:02 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771474164337/b99e16a5-d967-461d-87ef-fd96a7e58f34.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When teams adopt WebAuthn or FIDO2, the excitement is understandable:</p>
<ul>
<li><p>No passwords.</p>
</li>
<li><p>No phishing.</p>
</li>
<li><p>No credential stuffing.</p>
</li>
<li><p>Biometric UX.</p>
</li>
<li><p>Public-key cryptography.</p>
</li>
</ul>
<p>It feels like the final answer.</p>
<p>But WebAuthn answers exactly one question:</p>
<blockquote>
<p>Can this device prove control of a credential for this origin right now?</p>
</blockquote>
<p>It does not answer:</p>
<ul>
<li><p>Who is this user across systems?</p>
</li>
<li><p>What happens if the device is lost?</p>
</li>
<li><p>How do we bootstrap identity?</p>
</li>
<li><p>How do we link accounts?</p>
</li>
<li><p>How do we recover?</p>
</li>
<li><p>How do we federate across institutions?</p>
</li>
</ul>
<p>Passwordless authentication solves <strong>proof of possession</strong>.</p>
<p>Identity strategy solves <strong>continuity over time</strong>.</p>
<p>Those are different problems.</p>
<hr />
<h1 id="heading-the-illusion-of-pure-passwordless">The Illusion of “Pure Passwordless”</h1>
<p>It’s tempting to imagine a system that:</p>
<ul>
<li><p>Only uses WebAuthn</p>
</li>
<li><p>Has no identity provider</p>
</li>
<li><p>Has no fallback</p>
</li>
<li><p>Has no recovery flow</p>
</li>
</ul>
<p>On paper, that sounds maximally secure.</p>
<p>In reality, it’s brittle.</p>
<p>Let’s walk through real scenarios.</p>
<hr />
<h1 id="heading-scenario-1-device-loss">Scenario 1 — Device Loss</h1>
<p>User registers WebAuthn credential.</p>
<p>All good.</p>
<p>Then:</p>
<ul>
<li><p>Phone is lost.</p>
</li>
<li><p>Laptop is replaced.</p>
</li>
<li><p>Browser storage is cleared.</p>
</li>
</ul>
<p>Now what?</p>
<p>Without fallback:</p>
<ul>
<li><p>The account is inaccessible.</p>
</li>
<li><p>Support must intervene manually.</p>
</li>
<li><p>Or recovery becomes weak (email-only reset).</p>
</li>
</ul>
<p>If recovery is ad hoc, security erodes.</p>
<p>If recovery is absent, usability collapses.</p>
<p>This is why fallback is not compromise — it is necessity.</p>
<hr />
<h1 id="heading-fallback-is-a-design-requirement">Fallback Is a Design Requirement</h1>
<p>Fallback should not mean:</p>
<p>“Use a weaker method.”</p>
<p>It should mean:</p>
<p>“Use an alternate trust anchor.”</p>
<p>In your architecture, that trust anchor was Feide (OIDC).</p>
<p>WebAuthn provided:</p>
<ul>
<li>Device-bound possession proof.</li>
</ul>
<p>Feide provided:</p>
<ul>
<li>Federated identity continuity.</li>
</ul>
<p>That layering is deliberate.</p>
<hr />
<h1 id="heading-passwordless-without-federation-breaks-at-scale">Passwordless Without Federation Breaks at Scale</h1>
<p>In a real system:</p>
<ul>
<li><p>Users change devices.</p>
</li>
<li><p>Users move institutions.</p>
</li>
<li><p>Accounts are deactivated upstream.</p>
</li>
<li><p>Identity policies change.</p>
</li>
</ul>
<p>Without federation:</p>
<ul>
<li><p>You must manage identity lifecycle yourself.</p>
</li>
<li><p>You must build account verification logic.</p>
</li>
<li><p>You must build secure recovery flows.</p>
</li>
<li><p>You must handle identity merging.</p>
</li>
</ul>
<p>That is significantly more complex than integrating an IdP.</p>
<hr />
<h1 id="heading-enrollment-is-identity-design">Enrollment Is Identity Design</h1>
<p>Enrollment is often treated as a one-time setup.</p>
<p>It is not.</p>
<p>Enrollment defines:</p>
<ul>
<li><p>Who is allowed to create a credential?</p>
</li>
<li><p>How is that identity verified?</p>
</li>
<li><p>What trust anchor validates the user at registration?</p>
</li>
</ul>
<p>Example (ASP.NET Core + OIDC bootstrap):</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> externalUserId = claims.FindFirst(<span class="hljs-string">"sub"</span>)?.Value;

<span class="hljs-keyword">var</span> user = <span class="hljs-keyword">await</span> FindOrCreateUser(externalUserId);

<span class="hljs-keyword">if</span> (!user.WebAuthnCredentials.Any())
{
    <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/enable-passwordless"</span>);
}
</code></pre>
<p>Notice what happened:</p>
<ul>
<li><p>OIDC verified identity.</p>
</li>
<li><p>Only then did WebAuthn credential get registered.</p>
</li>
</ul>
<p>WebAuthn did not create identity.</p>
<p>It attached to it.</p>
<p>That ordering matters.</p>
<hr />
<h1 id="heading-recovery-is-where-identity-strategy-is-tested">Recovery Is Where Identity Strategy Is Tested</h1>
<p>The real test of maturity is not login success.</p>
<p>It’s failure recovery.</p>
<p>Lost device flow:</p>
<ol>
<li><p>User authenticates via OIDC.</p>
</li>
<li><p>System validates <code>sub</code> claim.</p>
</li>
<li><p>Existing WebAuthn credentials are revoked.</p>
</li>
<li><p>New device registers fresh credential.</p>
</li>
</ol>
<p>Example revocation logic:</p>
<pre><code class="lang-csharp">_db.WebAuthnCredentials.RemoveRange(user.WebAuthnCredentials);
<span class="hljs-keyword">await</span> _db.SaveChangesAsync();
</code></pre>
<p>Then redirect to registration.</p>
<p>This is structured recovery.</p>
<p>Without OIDC, you would need:</p>
<ul>
<li><p>Email-only verification</p>
</li>
<li><p>Manual admin override</p>
</li>
<li><p>Or permanent account loss</p>
</li>
</ul>
<p>None of those scale securely.</p>
<hr />
<h1 id="heading-device-bound-authentication-is-not-portable-identity">Device-Bound Authentication Is Not Portable Identity</h1>
<p>WebAuthn credentials are bound to:</p>
<ul>
<li><p>Origin</p>
</li>
<li><p>Device</p>
</li>
<li><p>RP ID</p>
</li>
</ul>
<p>They are intentionally non-transferable.</p>
<p>That’s why they’re secure.</p>
<p>But identity is portable.</p>
<p>Identity must:</p>
<ul>
<li><p>Survive device turnover</p>
</li>
<li><p>Integrate with external systems</p>
</li>
<li><p>Be recognized across services</p>
</li>
</ul>
<p>That’s federation.</p>
<hr />
<h1 id="heading-federation-is-not-the-enemy-of-passwordless">Federation Is Not the Enemy of Passwordless</h1>
<p>There’s a misconception:</p>
<p>“If I use OIDC fallback, I weaken passwordless.”</p>
<p>That only happens when fallback bypasses verification.</p>
<p>In your architecture:</p>
<ul>
<li><p>OIDC never created a session automatically.</p>
</li>
<li><p>Backend validated ID token.</p>
</li>
<li><p>Internal user mapping occurred.</p>
</li>
<li><p>HTTP-only cookie issued by your system.</p>
</li>
</ul>
<p>OIDC proved identity.</p>
<p>WebAuthn proved possession.</p>
<p>The trust boundaries remained intact.</p>
<hr />
<h1 id="heading-architectural-maturity-means-layering">Architectural Maturity Means Layering</h1>
<p>Let’s describe the trust model clearly.</p>
<p>Layer 1: Federation (Feide)</p>
<ul>
<li><p>Asserts institutional identity</p>
</li>
<li><p>Manages upstream lifecycle</p>
</li>
<li><p>Provides recovery</p>
</li>
</ul>
<p>Layer 2: Passwordless (WebAuthn)</p>
<ul>
<li><p>Proves device possession</p>
</li>
<li><p>Phishing-resistant</p>
</li>
<li><p>Per-origin authentication</p>
</li>
</ul>
<p>Layer 3: Session (HTTP-only cookie)</p>
<ul>
<li><p>Server-controlled</p>
</li>
<li><p>Revocable</p>
</li>
<li><p>Protected from JS</p>
</li>
</ul>
<p>Layer 4: Authorization</p>
<ul>
<li><p>Application-level access control</p>
</li>
<li><p>Role management</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771474693452/bebd416c-d613-48f9-9f58-59f7482ac929.png" alt class="image--center mx-auto" /></p>
<p>Each layer solves a different problem.</p>
<p>No single layer replaces the others.</p>
<hr />
<h1 id="heading-the-real-question">The Real Question</h1>
<p>When designing authentication, the mature question is not:</p>
<p>“How do we eliminate passwords?”</p>
<p>It is:</p>
<p>“How do we design identity continuity over time?”</p>
<p>Passwordless improves authentication strength.</p>
<p>Federation ensures identity stability.</p>
<p>Together, they create resilience.</p>
<hr />
<h1 id="heading-what-happens-if-you-ignore-this">What Happens If You Ignore This</h1>
<p>If passwordless stands alone:</p>
<ul>
<li><p>Enrollment becomes fragile.</p>
</li>
<li><p>Recovery becomes weak.</p>
</li>
<li><p>Identity merging becomes manual.</p>
</li>
<li><p>Device loss becomes support nightmare.</p>
</li>
<li><p>Organizational integration becomes impossible.</p>
</li>
</ul>
<p>The system becomes secure in theory, brittle in reality.</p>
<hr />
<h1 id="heading-the-strategic-insight">The Strategic Insight</h1>
<p>Passwordless is a mechanism.</p>
<p>Identity strategy is a lifecycle.</p>
<p>Mechanisms can be secure.</p>
<p>Lifecycles must be resilient.</p>
<p>Your architecture works because:</p>
<ul>
<li><p>It does not idolize passwordless.</p>
</li>
<li><p>It positions WebAuthn as primary.</p>
</li>
<li><p>It retains OIDC as structured fallback.</p>
</li>
<li><p>It treats recovery as planned, not emergency.</p>
</li>
<li><p>It separates identity from possession.</p>
</li>
</ul>
<p>That separation is the mark of architectural maturity.</p>
<hr />
<h1 id="heading-final-reflection">Final Reflection</h1>
<p>Passwordless alone is not enough.</p>
<p>Not because it’s weak.</p>
<p>But because identity is larger than authentication.</p>
<p>A secure system must answer:</p>
<ul>
<li><p>Who are you?</p>
</li>
<li><p>Can you prove it now?</p>
</li>
<li><p>What happens if you lose your device?</p>
</li>
<li><p>How do we recognize you tomorrow?</p>
</li>
<li><p>How do we integrate with your organization?</p>
</li>
</ul>
<p>WebAuthn answers one of those questions exceptionally well.</p>
<p>Federation answers the rest.</p>
<p>Designing both — intentionally — is what turns passwordless from a feature into an identity strategy.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p>→ <strong>Why Passwordless Alone Is Not an Identity Strategy</strong></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Passwordless: What Worked, What Didn’t, What I’d Change]]></title><description><![CDATA[When designing a passwordless-first PWA architecture, the diagram looks elegant.
In production, elegance collides with:

Browser inconsistencies

Institutional identity constraints

Support tickets

Device lifecycle chaos

Monitoring blind spots


Le...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change</guid><category><![CDATA[Production Lessons]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[#webauthn]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[OpenID Connect]]></category><category><![CDATA[Feide]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[Authentication Architecture]]></category><category><![CDATA[SecurityEngineering]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[asp.net core]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Thu, 19 Feb 2026 03:14:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771470850029/85055752-ab3e-493b-a76f-879159e4180b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When designing a passwordless-first PWA architecture, the diagram looks elegant.</p>
<p>In production, elegance collides with:</p>
<ul>
<li><p>Browser inconsistencies</p>
</li>
<li><p>Institutional identity constraints</p>
</li>
<li><p>Support tickets</p>
</li>
<li><p>Device lifecycle chaos</p>
</li>
<li><p>Monitoring blind spots</p>
</li>
</ul>
<p>Let’s break it down honestly.</p>
<hr />
<h1 id="heading-what-worked">What Worked</h1>
<h2 id="heading-1-webauthn-as-primary-authentication">1️⃣ WebAuthn as Primary Authentication</h2>
<p>This worked better than expected.</p>
<p>Users quickly adapted to:</p>
<ul>
<li><p>Fingerprint</p>
</li>
<li><p>Face recognition</p>
</li>
<li><p>Device PIN</p>
</li>
</ul>
<p>Support requests about “forgot password” dropped to zero — because passwords were gone.</p>
<p>The combination of:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> _fido2.MakeAssertionAsync(...)
</code></pre>
<p>and:</p>
<pre><code class="lang-csharp">HttpContext.SignInAsync(<span class="hljs-string">"Cookies"</span>, principal);
</code></pre>
<p>proved stable and predictable once encoding and session handling were correct.</p>
<p>The strongest success signal:</p>
<p>No phishing-related login issues after deployment.</p>
<p>That is not common.</p>
<h2 id="heading-2-http-only-cookie-sessions">2️⃣ HTTP-only Cookie Sessions</h2>
<p>Avoiding JWT-in-localStorage was absolutely the right call.</p>
<pre><code class="lang-csharp">options.Cookie.HttpOnly = <span class="hljs-literal">true</span>;
options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
</code></pre>
<p>Benefits:</p>
<ul>
<li><p>XSS impact minimized</p>
</li>
<li><p>Simpler revocation model</p>
</li>
<li><p>Clear session lifetime control</p>
</li>
</ul>
<p>Operationally, this reduced attack surface significantly.</p>
<h2 id="heading-3-clear-decision-tree">3️⃣ Clear Decision Tree</h2>
<p>My initial flowchart saved me.</p>
<p>Because when things broke, I always knew which branch was responsible:</p>
<ul>
<li><p>WebAuthn failure?</p>
</li>
<li><p>OIDC fallback?</p>
</li>
<li><p>Session misconfiguration?</p>
</li>
<li><p>Credential lifecycle issue?</p>
</li>
</ul>
<p>That clarity matters more than people realize.</p>
<hr />
<h1 id="heading-trade-offs-i-accepted-knowingly">Trade-offs I Accepted Knowingly</h1>
<h2 id="heading-1-no-attestation-verification">1️⃣ No Attestation Verification</h2>
<pre><code class="lang-csharp">AttestationConveyancePreference.None
</code></pre>
<p>Trade-off:</p>
<ul>
<li><p>No hardware manufacturer validation</p>
</li>
<li><p>No enforcement of hardware-backed keys</p>
</li>
</ul>
<p>Why I accepted it:</p>
<ul>
<li><p>Lower operational complexity</p>
</li>
<li><p>Better privacy posture</p>
</li>
<li><p>Reduced metadata dependency</p>
</li>
</ul>
<p>In institutional context, identity assurance was already upstream via Feide.</p>
<h2 id="heading-2-preferred-instead-of-required-user-verification">2️⃣ Preferred Instead of Required User Verification</h2>
<pre><code class="lang-csharp">UserVerificationRequirement.Preferred
</code></pre>
<p>Trade-off:</p>
<ul>
<li><p>Allows authenticators without biometric enforcement</p>
</li>
<li><p>Slightly lower strictness</p>
</li>
</ul>
<p>Why:</p>
<ul>
<li><p>Broader device compatibility</p>
</li>
<li><p>Fewer user lockouts</p>
</li>
<li><p>Reduced friction in older hardware environments</p>
</li>
</ul>
<p>Security posture was balanced against accessibility.</p>
<h2 id="heading-3-no-offline-authentication">3️⃣ No Offline Authentication</h2>
<p>PWA expectation:<br />“It’s installed. It should work offline.”</p>
<p>Reality:<br />WebAuthn requires server challenge.</p>
<p>I chose not to simulate offline authentication using cached tokens beyond session lifetime.</p>
<p>Trade-off:</p>
<ul>
<li><p>Some UX friction</p>
</li>
<li><p>Stronger trust model</p>
</li>
</ul>
<p>Security &gt; illusion of offline login.</p>
<hr />
<h1 id="heading-what-looked-good-on-paper-but-failed-in-reality">What Looked Good on Paper But Failed in Reality</h1>
<h2 id="heading-1-users-will-immediately-register-passwordless">1️⃣ “Users Will Immediately Register Passwordless”</h2>
<p>They didn’t.</p>
<p>Even after OIDC login, many skipped enabling passwordless.</p>
<p>The elegant flow:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">if</span> (!user.Credentials.Any())
    <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/enable-passwordless"</span>);
</code></pre>
<p>In reality:<br />Users ignored prompts.</p>
<p>Lesson:<br />Make passwordless enrollment prominent and incentivized.</p>
<h2 id="heading-2-counter-strictness">2️⃣ Counter Strictness</h2>
<p>Initially:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">if</span> (result.Counter &lt;= storedCounter)
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> SecurityException(<span class="hljs-string">"Possible cloned authenticator"</span>);
</code></pre>
<p>This caused false positives.</p>
<p>Some authenticators:</p>
<ul>
<li><p>Always returned 0</p>
</li>
<li><p>Didn’t increment reliably</p>
</li>
</ul>
<p>Lesson:<br />Spec compliance is messier than the spec implies.</p>
<p>Relaxed logic to handle zero counters more intelligently.</p>
<h2 id="heading-3-browser-error-consistency">3️⃣ Browser Error Consistency</h2>
<p>I assumed:</p>
<p>“All browsers implement WebAuthn uniformly.”</p>
<p>Reality:</p>
<ul>
<li><p>Different error messages</p>
</li>
<li><p>Different cancellation behaviors</p>
</li>
<li><p>Slight timing differences</p>
</li>
</ul>
<p>VueJS error handling needed refinement:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">catch</span> (err) {
  <span class="hljs-keyword">if</span> (err.name === <span class="hljs-string">"NotAllowedError"</span>) {
    showRetry();
  } <span class="hljs-keyword">else</span> {
    showFallbackOption();
  }
}
</code></pre>
<p>UX required careful branching.</p>
<hr />
<h1 id="heading-operational-lessons">Operational Lessons</h1>
<h2 id="heading-1-logging-matters-more-than-crypto">1️⃣ Logging Matters More Than Crypto</h2>
<p>You need logs for:</p>
<ul>
<li><p>Challenge generation</p>
</li>
<li><p>Assertion verification result</p>
</li>
<li><p>Counter updates</p>
</li>
<li><p>OIDC callback mapping</p>
</li>
<li><p>Session creation</p>
</li>
</ul>
<p>Example structured logging:</p>
<pre><code class="lang-csharp">_logger.LogInformation(<span class="hljs-string">"WebAuthn assertion verified for user {UserId}, counter updated to {Counter}"</span>,
    user.Id, result.Counter);
</code></pre>
<p>Without this, debugging failures becomes guesswork.</p>
<h2 id="heading-2-monitoring-authentication-metrics">2️⃣ Monitoring Authentication Metrics</h2>
<p>Track:</p>
<ul>
<li><p>WebAuthn success rate</p>
</li>
<li><p>WebAuthn failure rate</p>
</li>
<li><p>OIDC fallback frequency</p>
</li>
<li><p>Credential registrations per day</p>
</li>
<li><p>Counter mismatch events</p>
</li>
</ul>
<p>These reveal patterns:</p>
<ul>
<li><p>Device compatibility issues</p>
</li>
<li><p>Misconfiguration</p>
</li>
<li><p>User confusion</p>
</li>
</ul>
<p>Authentication is not “set and forget.”</p>
<h2 id="heading-3-support-edge-cases">3️⃣ Support Edge Cases</h2>
<p>Real tickets included:</p>
<ul>
<li><p>“My fingerprint stopped working after OS update.”</p>
</li>
<li><p>“I cleared my browser data and now can’t log in.”</p>
</li>
<li><p>“I logged in via Feide but it says no account.”</p>
</li>
</ul>
<p>Each required:</p>
<ul>
<li><p>Clear recovery path</p>
</li>
<li><p>Transparent error messaging</p>
</li>
<li><p>Internal documentation</p>
</li>
</ul>
<p>Edge cases are not rare. They are normal.</p>
<h2 id="heading-4-account-linking-confusion">4️⃣ Account Linking Confusion</h2>
<p>Some users had:</p>
<ul>
<li><p>Multiple institutional identities</p>
</li>
<li><p>Email changes</p>
</li>
<li><p>Duplicate accounts</p>
</li>
</ul>
<p>Relying solely on email would have been disastrous.</p>
<p>Using <code>sub</code> claim for linking was critical:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> externalUserId = claims.FindFirst(<span class="hljs-string">"sub"</span>)?.Value;
</code></pre>
<p>Stable identifiers are everything.</p>
<hr />
<h1 id="heading-what-i-would-change">What I Would Change</h1>
<h2 id="heading-1-stronger-enrollment-enforcement">1️⃣ Stronger Enrollment Enforcement</h2>
<p>Instead of optional passwordless enablement:</p>
<p>I would require it after first successful OIDC login.</p>
<p>Security adoption improves when it’s default, not optional.</p>
<h2 id="heading-2-better-device-management-ui">2️⃣ Better Device Management UI</h2>
<p>Users should see:</p>
<ul>
<li><p>List of registered devices</p>
</li>
<li><p>Last used timestamp</p>
</li>
<li><p>Revoke option</p>
</li>
</ul>
<p>Backend model already supports it:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> WebAuthnCredentials <span class="hljs-keyword">WHERE</span> UserId = @UserId
</code></pre>
<p>But UX should surface it more clearly.</p>
<h2 id="heading-3-structured-monitoring-dashboard">3️⃣ Structured Monitoring Dashboard</h2>
<p>Real-time visibility into:</p>
<ul>
<li><p>Assertion failures</p>
</li>
<li><p>Counter mismatches</p>
</li>
<li><p>OIDC errors</p>
</li>
</ul>
<p>Would reduce reactive debugging.</p>
<h2 id="heading-4-automated-credential-health-checks">4️⃣ Automated Credential Health Checks</h2>
<p>Periodic validation:</p>
<ul>
<li><p>Detect stale counters</p>
</li>
<li><p>Detect inactive credentials</p>
</li>
<li><p>Flag suspicious behavior</p>
</li>
</ul>
<p>WebAuthn gives strong primitives. Monitoring must match.</p>
<hr />
<h1 id="heading-the-big-lesson">The Big Lesson</h1>
<p>The hardest part of passwordless authentication is not cryptography.</p>
<p>It is lifecycle management.</p>
<p>WebAuthn works.</p>
<p>OIDC works.</p>
<p>HTTP-only cookies work.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771470354356/496889c4-7852-4d1c-9064-05fc6576dd08.png" alt class="image--center mx-auto" /></p>
<p>But the real challenge is designing:</p>
<ul>
<li><p>Failure handling</p>
</li>
<li><p>Device transitions</p>
</li>
<li><p>Recovery paths</p>
</li>
<li><p>Operational visibility</p>
</li>
</ul>
<p>Security architecture is not proven at deployment.</p>
<p>It is proven over time.</p>
<hr />
<h1 id="heading-final-reflection">Final Reflection</h1>
<p>If I rebuilt this system:</p>
<ul>
<li><p>I would keep passwordless-first.</p>
</li>
<li><p>I would keep Feide federation.</p>
</li>
<li><p>I would keep server-controlled sessions.</p>
</li>
<li><p>I would invest earlier in monitoring and enrollment enforcement.</p>
</li>
</ul>
<p>What surprised me most?</p>
<p>How much calmer authentication became once passwords were gone.</p>
<p>No resets.<br />No reuse.<br />No phishing alerts.</p>
<p>Just possession proof + federated identity continuity.</p>
<p>That combination feels less like a feature and more like an upgrade to the trust model of the application itself.</p>
<p>And that, ultimately, was the goal of the entire series. This concludes the series, but you can still check out my next optional extras articles next.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p>→ <strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Integrating OIDC (Feide) as Fallback and Recovery]]></title><description><![CDATA[WebAuthn gave us phishing-resistant, device-bound authentication.But devices get lost. Browsers reset. Users switch laptops. Institutions manage identities centrally.
That’s where OIDC (Feide) enters — not as a competitor to passwordless, but as stru...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery</guid><category><![CDATA[OpenID Connect]]></category><category><![CDATA[OIDC]]></category><category><![CDATA[Feide]]></category><category><![CDATA[#webauthn]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[Federated Identity]]></category><category><![CDATA[account linking]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[Authentication Architecture]]></category><category><![CDATA[asp.net core]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Wed, 18 Feb 2026 02:59:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771383125974/0ea35be2-6c8d-465b-94b8-72ef335d3470.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>WebAuthn gave us phishing-resistant, device-bound authentication.<br />But devices get lost. Browsers reset. Users switch laptops. Institutions manage identities centrally.</p>
<p>That’s where <strong>OIDC (Feide)</strong> enters — not as a competitor to passwordless, but as structural support.</p>
<p>This article walks through my real implementation:</p>
<ul>
<li><p><strong>Frontend:</strong> VueJS PWA</p>
</li>
<li><p><strong>Backend:</strong> ASP.NET Core</p>
</li>
<li><p><strong>Database:</strong> SQL Server</p>
</li>
<li><p><strong>Passwordless:</strong> fido2-net-lib</p>
</li>
<li><p><strong>Federation:</strong> OpenID Connect (Feide)</p>
</li>
<li><p><strong>Session:</strong> HTTP-only cookie</p>
</li>
</ul>
<p>And we’ll focus on four things:</p>
<ol>
<li><p>What can Feide bring to the table</p>
</li>
<li><p>How OIDC fits without undermining WebAuthn</p>
</li>
<li><p>Security boundaries between IdP and my system</p>
</li>
<li><p>Account linking in practice</p>
</li>
</ol>
<h3 id="heading-disclaimer">Disclaimer</h3>
<p><em>This article describes architectural patterns and technical approaches based on a real-world implementation. All examples, code snippets, and flow descriptions have been generalized and simplified for educational purposes. No proprietary business logic, confidential configurations, credentials, or organization-specific details are disclosed. The focus is strictly on publicly documented standards (WebAuthn, OIDC) and implementation patterns within a standard VueJS + ASP.NET Core + SQL Server stack.</em></p>
<hr />
<h1 id="heading-what-can-feide-bring-to-the-table">What can Feide bring to the table</h1>
<p><a target="_blank" href="https://docs.feide.no/general/feide_overview.html">Feide</a> is widely used in Norwegian education and research sectors. That matters for three reasons:</p>
<h3 id="heading-1-institutional-identity-already-exists">1️⃣ Institutional Identity Already Exists</h3>
<p>Users already have:</p>
<ul>
<li><p>A managed identity</p>
</li>
<li><p>Centralized credential lifecycle</p>
</li>
<li><p>Organizational trust</p>
</li>
</ul>
<p>Recreating identity inside my PWA would be redundant and weaker.</p>
<h3 id="heading-2-compliance-amp-governance">2️⃣ Compliance &amp; Governance</h3>
<p>Institutional IdPs typically enforce:</p>
<ul>
<li><p>MFA policies</p>
</li>
<li><p>Password strength</p>
</li>
<li><p>Account revocation</p>
</li>
<li><p>Auditing</p>
</li>
</ul>
<p>By integrating Feide, my system inherits that upstream assurance without storing passwords.</p>
<h3 id="heading-3-recovery-and-bootstrap">3️⃣ Recovery and Bootstrap</h3>
<p>WebAuthn is device-bound.</p>
<p>Feide provides:</p>
<ul>
<li><p>Cross-device identity continuity</p>
</li>
<li><p>Secure account recovery</p>
</li>
<li><p>Bootstrap trust for new devices</p>
</li>
</ul>
<hr />
<h1 id="heading-how-oidc-fits-without-undermining-passwordless">How OIDC Fits Without Undermining Passwordless</h1>
<p>The common fear:</p>
<blockquote>
<p>“If I add OIDC fallback, doesn’t that weaken passwordless?”</p>
</blockquote>
<p>Only if fallback is careless.</p>
<p>My architecture enforces this model:</p>
<ul>
<li><p>WebAuthn = primary authentication</p>
</li>
<li><p>Feide OIDC = bootstrap + recovery</p>
</li>
<li><p>HTTP-only cookie = session integrity</p>
</li>
<li><p>SQL Server = credential persistence</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771382401976/436634c8-3781-4665-98b2-a3fe4b13ec9c.png" alt class="image--center mx-auto" /></p>
<p>Feide does not authenticate users inside my system directly.</p>
<p>Feide asserts identity.</p>
<p>WebAuthn proves device possession.</p>
<p>Those are different trust layers.</p>
<hr />
<h1 id="heading-real-oidc-integration-aspnet-core">Real OIDC Integration (ASP.NET Core)</h1>
<p>My implemented Authorization Code flow with PKCE.</p>
<h3 id="heading-oidc-configuration">OIDC Configuration</h3>
<pre><code class="lang-csharp">services.AddAuthentication(options =&gt;
{
    options.DefaultScheme = <span class="hljs-string">"Cookies"</span>;
    options.DefaultChallengeScheme = <span class="hljs-string">"oidc"</span>;
})
.AddCookie(<span class="hljs-string">"Cookies"</span>, options =&gt;
{
    options.Cookie.HttpOnly = <span class="hljs-literal">true</span>;
    options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
    options.Cookie.SameSite = SameSiteMode.Lax;
})
.AddOpenIdConnect(<span class="hljs-string">"oidc"</span>, options =&gt;
{
    options.Authority = <span class="hljs-string">"https://auth.feide.no"</span>;
    options.ClientId = Configuration[<span class="hljs-string">"Feide:ClientId"</span>];
    options.ClientSecret = Configuration[<span class="hljs-string">"Feide:ClientSecret"</span>];
    options.ResponseType = <span class="hljs-string">"code"</span>;
    options.SaveTokens = <span class="hljs-literal">false</span>;
    options.GetClaimsFromUserInfoEndpoint = <span class="hljs-literal">true</span>;

    options.Scope.Add(<span class="hljs-string">"openid"</span>);
    options.Scope.Add(<span class="hljs-string">"profile"</span>);
    options.Scope.Add(<span class="hljs-string">"email"</span>);

    options.TokenValidationParameters.NameClaimType = <span class="hljs-string">"name"</span>;
});
</code></pre>
<p>Important detail:</p>
<pre><code class="lang-csharp">options.SaveTokens = <span class="hljs-literal">false</span>;
</code></pre>
<p>You do not store IdP tokens in the browser.</p>
<p>You convert identity into a server-controlled session.</p>
<hr />
<h1 id="heading-oidc-callback-flow">OIDC Callback Flow</h1>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpGet(<span class="hljs-meta-string">"callback"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;IActionResult&gt; <span class="hljs-title">Callback</span>(<span class="hljs-params"></span>)</span>
{
    <span class="hljs-keyword">var</span> authenticateResult = <span class="hljs-keyword">await</span> HttpContext.AuthenticateAsync(<span class="hljs-string">"oidc"</span>);

    <span class="hljs-keyword">if</span> (!authenticateResult.Succeeded)
        <span class="hljs-keyword">return</span> Unauthorized();

    <span class="hljs-keyword">var</span> externalUserId = authenticateResult.Principal.FindFirst(<span class="hljs-string">"sub"</span>)?.Value;

    <span class="hljs-keyword">var</span> user = <span class="hljs-keyword">await</span> FindOrCreateUser(externalUserId);

    SignInUser(user.Id);

    <span class="hljs-keyword">if</span> (!user.WebAuthnCredentials.Any())
        <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/enable-passwordless"</span>);

    <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/dashboard"</span>);
}
</code></pre>
<p>This is critical:</p>
<ul>
<li><p>Feide proves identity.</p>
</li>
<li><p>The system maps that identity to internal user record.</p>
</li>
<li><p>The system issues session cookie.</p>
</li>
</ul>
<p>The IdP does not create sessions in my system.</p>
<hr />
<h1 id="heading-security-boundaries-between-oidc-and-my-system">Security Boundaries Between OIDC and My System</h1>
<p>Understanding boundaries prevents architectural confusion.</p>
<h2 id="heading-what-oidc-is-responsible-for">What OIDC Is Responsible For</h2>
<ul>
<li><p>Authenticating the user upstream</p>
</li>
<li><p>Issuing ID tokens</p>
</li>
<li><p>Managing institutional identity lifecycle</p>
</li>
<li><p>Enforcing upstream MFA policies</p>
</li>
</ul>
<h2 id="heading-what-my-system-is-responsible-for">What My System Is Responsible For</h2>
<ul>
<li><p>Mapping external identity (<code>sub</code>) to internal user</p>
</li>
<li><p>Managing WebAuthn credentials</p>
</li>
<li><p>Verifying FIDO2 assertions</p>
</li>
<li><p>Issuing and invalidating session cookies</p>
</li>
<li><p>Authorization within my application</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771382763546/5a5fa933-6a2e-48b3-a819-28f7dc85eb4b.png" alt class="image--center mx-auto" /></p>
<p>OIDC is not trusted to:</p>
<ul>
<li><p>Authorize application actions</p>
</li>
<li><p>Manage WebAuthn devices</p>
</li>
<li><p>Maintain the session integrity</p>
</li>
</ul>
<p>Trust is layered, not delegated.</p>
<hr />
<h1 id="heading-account-linking-considerations">Account Linking Considerations</h1>
<p>This is where real complexity lives.</p>
<p>OIDC provides:</p>
<pre><code class="lang-javascript">{
  <span class="hljs-string">"sub"</span>: <span class="hljs-string">"abcd1234"</span>,
  <span class="hljs-string">"email"</span>: <span class="hljs-string">"user@example.edu"</span>
}
</code></pre>
<p>But what if:</p>
<ul>
<li><p>Email changes?</p>
</li>
<li><p>User logs in with different institutional account?</p>
</li>
<li><p>Duplicate local account exists?</p>
</li>
</ul>
<p>You must choose a stable linking strategy.</p>
<h2 id="heading-recommended-linking-model">Recommended Linking Model</h2>
<p>Use <code>sub</code> as the primary external identifier.</p>
<p>Database model:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> ExternalLogins (
    <span class="hljs-keyword">Id</span> UNIQUEIDENTIFIER PRIMARY <span class="hljs-keyword">KEY</span>,
    UserId UNIQUEIDENTIFIER <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    Provider <span class="hljs-keyword">NVARCHAR</span>(<span class="hljs-number">50</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    ExternalSubject <span class="hljs-keyword">NVARCHAR</span>(<span class="hljs-number">255</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>
);
</code></pre>
<p>Mapping logic:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> externalLogin = <span class="hljs-keyword">await</span> _db.ExternalLogins
    .FirstOrDefaultAsync(x =&gt;
        x.Provider == <span class="hljs-string">"Feide"</span> &amp;&amp;
        x.ExternalSubject == externalUserId);

<span class="hljs-keyword">if</span> (externalLogin == <span class="hljs-literal">null</span>)
{
    <span class="hljs-comment">// First login → create link</span>
    <span class="hljs-keyword">var</span> user = CreateNewUser();
    _db.ExternalLogins.Add(<span class="hljs-keyword">new</span> ExternalLogin {
        UserId = user.Id,
        Provider = <span class="hljs-string">"Feide"</span>,
        ExternalSubject = externalUserId
    });
}
</code></pre>
<p>Never rely solely on email for linking.</p>
<p>Emails change. <code>sub</code> should not.</p>
<hr />
<h1 id="heading-recovery-flow-using-feide">Recovery Flow Using Feide</h1>
<p>Lost device scenario:</p>
<ol>
<li><p>User clicks “Login with Feide”</p>
</li>
<li><p>OIDC completes</p>
</li>
<li><p>Identity verified</p>
</li>
<li><p>System invalidates old WebAuthn credentials</p>
</li>
<li><p>User registers new credential</p>
</li>
</ol>
<p>Example revocation:</p>
<pre><code class="lang-csharp">_db.WebAuthnCredentials.RemoveRange(user.WebAuthnCredentials);
<span class="hljs-keyword">await</span> _db.SaveChangesAsync();
</code></pre>
<p>Then redirect to registration.</p>
<p>Recovery is structured. Not improvised.</p>
<hr />
<h1 id="heading-why-this-does-not-undermine-passwordless">Why This Does Not Undermine Passwordless</h1>
<p>Weak fallback undermines security when:</p>
<ul>
<li><p>It bypasses verification</p>
</li>
<li><p>It skips policy</p>
</li>
<li><p>It exists only as emergency shortcut</p>
</li>
</ul>
<p>My implementation ensures:</p>
<ul>
<li><p>OIDC must complete successfully</p>
</li>
<li><p>Session is server-issued</p>
</li>
<li><p>WebAuthn remains primary method</p>
</li>
<li><p>Registration after OIDC is explicit</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771383053155/e3ce863b-1f1d-4178-96bf-a0e6de19693f.png" alt class="image--center mx-auto" /></p>
<p>This maintains assurance.</p>
<hr />
<h1 id="heading-vuejs-pwa-integration">VueJS PWA Integration</h1>
<p>From frontend:</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">loginWithFeide</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-built_in">window</span>.location.href = <span class="hljs-string">"/api/auth/feide-login"</span>;
}
</code></pre>
<p>No tokens stored client-side.<br />No JWT in localStorage.<br />No client-managed identity state.</p>
<p>The PWA only reacts to session cookie.</p>
<p>This keeps attack surface small.</p>
<hr />
<h1 id="heading-what-this-architecture-achieves">What This Architecture Achieves</h1>
<p>By combining:</p>
<ul>
<li><p>WebAuthn (device-bound proof)</p>
</li>
<li><p>Feide OIDC (identity continuity)</p>
</li>
<li><p>SQL Server (credential persistence)</p>
</li>
<li><p>HTTP-only cookies (session security)</p>
</li>
</ul>
<p>You achieve:</p>
<ul>
<li><p>Phishing resistance</p>
</li>
<li><p>Device lifecycle resilience</p>
</li>
<li><p>Institutional identity integration</p>
</li>
<li><p>Controlled fallback</p>
</li>
<li><p>Clear trust boundaries</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771383362636/e4304380-fee7-4b23-b891-0582eb1f9de5.png" alt class="image--center mx-auto" /></p>
<p>Most importantly:</p>
<p>You avoid false dichotomy.</p>
<p>This is not:</p>
<p>“Passwordless vs Federation.”</p>
<p>It is:</p>
<p>“Passwordless for authentication. Federation for identity continuity.”</p>
<hr />
<h1 id="heading-final-reflection">Final Reflection</h1>
<p>Integrating OIDC did not weaken the system.</p>
<p>It completed it.</p>
<p>WebAuthn without federation is brittle.<br />Federation without WebAuthn is phishable.</p>
<p>Together, they form a layered trust architecture.</p>
<p>In the next article, we’ll examine operational lessons learned after deploying this combined system — including monitoring, auditing, and real-world behavioral patterns that only surface after production traffic begins.</p>
<p>Because authentication design doesn’t end at implementation.</p>
<p>It evolves under pressure.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p>→ <strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Implementing WebAuthn in Practice]]></title><description><![CDATA[WebAuthn looks deceptively simple at a high level:

Generate challenge

Call browser API

Verify signature

Done


In practice, it is not that simple.
WebAuthn is cryptographically elegant but operationally unforgiving.Small mistakes create subtle se...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice</guid><category><![CDATA[#webauthn]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[fido2-net-lib]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[public-key cryptgraphy]]></category><category><![CDATA[Authentication Architecture]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[Application Security]]></category><category><![CDATA[asp.net core]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Tue, 17 Feb 2026 08:38:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771316072943/5198e555-e9f7-469e-a948-0182afe23005.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>WebAuthn looks deceptively simple at a high level:</p>
<ul>
<li><p>Generate challenge</p>
</li>
<li><p>Call browser API</p>
</li>
<li><p>Verify signature</p>
</li>
<li><p>Done</p>
</li>
</ul>
<p>In practice, it is not that simple.</p>
<p>WebAuthn is cryptographically elegant but operationally unforgiving.<br />Small mistakes create subtle security gaps or inexplicable failures.</p>
<p>This article walks through:</p>
<ul>
<li><p>The tooling used</p>
</li>
<li><p>The data model design</p>
</li>
<li><p>Real code from ASP.NET Core + VueJS</p>
</li>
<li><p>Common pitfalls</p>
</li>
<li><p>And what surprised me during implementation</p>
</li>
</ul>
<h3 id="heading-disclaimer">Disclaimer</h3>
<p><em>This article describes architectural patterns and technical approaches based on a real-world implementation. All examples, code snippets, and flow descriptions have been generalized and simplified for educational purposes. No proprietary business logic, confidential configurations, credentials, or organization-specific details are disclosed. The focus is strictly on publicly documented standards (WebAuthn, OIDC) and implementation patterns within a standard VueJS + ASP.NET Core + SQL Server stack.</em></p>
<hr />
<h1 id="heading-tooling-used">Tooling Used</h1>
<h2 id="heading-backend-fido2-net-lib">Backend: <code>fido2-net-lib</code></h2>
<p>For .NET Core, <a target="_blank" href="https://github.com/passwordless-lib/fido2-net-lib"><code>fido2-net-lib</code></a> is one of the most mature and spec-compliant WebAuthn libraries available.</p>
<p>It handles:</p>
<ul>
<li><p>Challenge generation</p>
</li>
<li><p>Attestation verification</p>
</li>
<li><p>Assertion verification</p>
</li>
<li><p>Counter validation</p>
</li>
<li><p>Origin validation</p>
</li>
<li><p>Credential parsing</p>
</li>
</ul>
<p>Initialization:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> fido2 = <span class="hljs-keyword">new</span> Fido2(<span class="hljs-keyword">new</span> Fido2Configuration
{
    ServerDomain = <span class="hljs-string">"yourdomain.com"</span>,
    ServerName = <span class="hljs-string">"Your App"</span>,
    Origin = <span class="hljs-string">"https://yourdomain.com"</span>
});
</code></pre>
<p>The important realization:</p>
<p>The library handles cryptography —<br />You must handle state.</p>
<h2 id="heading-frontend-native-webauthn-api">Frontend: Native WebAuthn API</h2>
<p>In VueJS, no heavy library was required.<br />The browser already implements <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Authentication_API">WebAuthn</a>.</p>
<p>Registration:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> credential = <span class="hljs-keyword">await</span> navigator.credentials.create({
  <span class="hljs-attr">publicKey</span>: options
});
</code></pre>
<p>Authentication:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> assertion = <span class="hljs-keyword">await</span> navigator.credentials.get({
  <span class="hljs-attr">publicKey</span>: options
});
</code></pre>
<p>However:</p>
<p><mark>You must convert Base64URL fields correctly between server and client.</mark></p>
<p>This is one of the first places things break.</p>
<hr />
<h1 id="heading-data-model-design-sql-server">Data Model Design (SQL Server)</h1>
<p>This is where real decisions matter.</p>
<p>A WebAuthn credential is not just an ID.</p>
<p>Here’s the simplified SQL model:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> WebAuthnCredentials (
    <span class="hljs-keyword">Id</span> UNIQUEIDENTIFIER PRIMARY <span class="hljs-keyword">KEY</span>,
    UserId UNIQUEIDENTIFIER <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    CredentialId VARBINARY(<span class="hljs-keyword">MAX</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    PublicKey VARBINARY(<span class="hljs-keyword">MAX</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    SignatureCounter <span class="hljs-built_in">BIGINT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    CreatedAt DATETIME2 <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span> <span class="hljs-keyword">DEFAULT</span> <span class="hljs-keyword">SYSUTCDATETIME</span>()
);
</code></pre>
<h3 id="heading-why-varbinary">Why VARBINARY?</h3>
<p>Because:</p>
<ul>
<li><p>Credential IDs are binary.</p>
</li>
<li><p>Public keys are binary (COSE format).</p>
</li>
<li><p>Storing them as strings introduces encoding risk.</p>
</li>
</ul>
<h3 id="heading-why-store-signaturecounter">Why store SignatureCounter?</h3>
<p>The counter protects against cloned authenticators.</p>
<p>If the new counter ≤ stored counter, something is wrong.</p>
<p>WebAuthn security is incomplete without counter tracking.</p>
<hr />
<h1 id="heading-registration-flow-real-implementation">Registration Flow (Real Implementation)</h1>
<h2 id="heading-step-1-generate-options">Step 1: Generate Options</h2>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpPost(<span class="hljs-meta-string">"register-options"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">RegisterOptions</span>(<span class="hljs-params"></span>)</span>
{
    <span class="hljs-keyword">var</span> user = GetCurrentUser();

    <span class="hljs-keyword">var</span> options = _fido2.RequestNewCredential(
        <span class="hljs-keyword">new</span> Fido2User
        {
            Id = Encoding.UTF8.GetBytes(user.Id.ToString()),
            Name = user.Email,
            DisplayName = user.Email
        },
        <span class="hljs-keyword">new</span> List&lt;PublicKeyCredentialDescriptor&gt;(),
        AuthenticatorSelection.Default,
        AttestationConveyancePreference.None
    );

    HttpContext.Session.SetString(<span class="hljs-string">"fido2.attestationChallenge"</span>, options.Challenge);

    <span class="hljs-keyword">return</span> Ok(options);
}
</code></pre>
<p>Notice:</p>
<ul>
<li><p>Challenge is stored server-side.</p>
</li>
<li><p>Attestation preference set to <code>None</code> (privacy-friendly).</p>
</li>
<li><p>No credentials excluded in this example.</p>
</li>
</ul>
<h2 id="heading-step-2-verify-attestation">Step 2: Verify Attestation</h2>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpPost(<span class="hljs-meta-string">"verify-registration"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;IActionResult&gt; <span class="hljs-title">VerifyRegistration</span>(<span class="hljs-params">[FromBody] AuthenticatorAttestationRawResponse attestation</span>)</span>
{
    <span class="hljs-keyword">var</span> challenge = HttpContext.Session.GetString(<span class="hljs-string">"fido2.attestationChallenge"</span>);

    <span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> _fido2.MakeNewCredentialAsync(
        attestation,
        <span class="hljs-keyword">new</span> List&lt;PublicKeyCredentialDescriptor&gt;(),
        (args) =&gt; args.Challenge == challenge
    );

    <span class="hljs-keyword">var</span> credential = <span class="hljs-keyword">new</span> WebAuthnCredential
    {
        UserId = GetCurrentUserId(),
        CredentialId = result.Result.CredentialId,
        PublicKey = result.Result.PublicKey,
        SignatureCounter = result.Result.Counter
    };

    _db.WebAuthnCredentials.Add(credential);
    <span class="hljs-keyword">await</span> _db.SaveChangesAsync();

    <span class="hljs-keyword">return</span> Ok();
}
</code></pre>
<p>Key insight:</p>
<p><mark>The challenge validator delegate must explicitly check equality.</mark></p>
<p>Do not assume the library does that for you.</p>
<hr />
<h1 id="heading-authentication-flow-assertion">Authentication Flow (Assertion)</h1>
<h2 id="heading-generate-assertion-options">Generate Assertion Options</h2>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> options = _fido2.GetAssertionOptions(
    storedCredentials,
    UserVerificationRequirement.Preferred
);

HttpContext.Session.SetString(<span class="hljs-string">"fido2.challenge"</span>, options.Challenge);
</code></pre>
<h2 id="heading-verify-assertion">Verify Assertion</h2>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> _fido2.MakeAssertionAsync(
    clientResponse,
    storedCredential.PublicKey,
    storedCredential.SignatureCounter,
    args =&gt; args.Challenge == challenge
);

storedCredential.SignatureCounter = result.Counter;
<span class="hljs-keyword">await</span> _db.SaveChangesAsync();
</code></pre>
<p>The counter update is not optional.</p>
<p>It is part of replay protection.</p>
<hr />
<h1 id="heading-common-implementation-pitfalls">Common Implementation Pitfalls</h1>
<h2 id="heading-1-base64url-encoding-mismatches">1. Base64URL encoding mismatches</h2>
<p>Browser returns ArrayBuffers.<br />ASP.NET expects byte arrays.</p>
<p>If encoding conversion is inconsistent, verification fails silently.</p>
<p>Solution: Use consistent Base64URL encoding utilities.</p>
<h3 id="heading-example">Example</h3>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> assertion = <span class="hljs-keyword">await</span> navigator.credentials.get({ <span class="hljs-attr">publicKey</span>: options });

<span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/auth/verify-webauthn"</span>, {
  <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
  <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify(assertion)
});
</code></pre>
<p>Problem: <code>assertion.rawId</code> is an ArrayBuffer — not Base64URL.</p>
<p>Explicit conversion helpers:</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">bufferToBase64Url</span>(<span class="hljs-params">buffer</span>) </span>{
  <span class="hljs-keyword">return</span> btoa(<span class="hljs-built_in">String</span>.fromCharCode(...new <span class="hljs-built_in">Uint8Array</span>(buffer)))
    .replace(<span class="hljs-regexp">/\+/g</span>, <span class="hljs-string">'-'</span>)
    .replace(<span class="hljs-regexp">/\//g</span>, <span class="hljs-string">'_'</span>)
    .replace(<span class="hljs-regexp">/=/g</span>, <span class="hljs-string">''</span>);
}
</code></pre>
<h2 id="heading-2-forgetting-challenge-persistence">2. Forgetting challenge persistence</h2>
<p>If the challenge:</p>
<ul>
<li><p>is not stored,</p>
</li>
<li><p>or stored per user incorrectly,</p>
</li>
<li><p>or overwritten in concurrent requests,</p>
</li>
</ul>
<p>verification fails.</p>
<p>Challenge must be:</p>
<ul>
<li><p>short-lived,</p>
</li>
<li><p>per session,</p>
</li>
<li><p>non-reusable.</p>
</li>
</ul>
<h3 id="heading-example-1">Example</h3>
<pre><code class="lang-csharp">HttpContext.Session.SetString(<span class="hljs-string">"challenge"</span>, options.Challenge);
</code></pre>
<p>Then later:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">var</span> challenge = HttpContext.Session.GetString(<span class="hljs-string">"fido2.challenge"</span>);
</code></pre>
<p>By using 2 different keys would introduce this bug:</p>
<pre><code class="lang-csharp">Fido2VerificationException: Challenge mismatch
</code></pre>
<p>Or:</p>
<pre><code class="lang-csharp">Fido2VerificationException: Invalid challenge.
</code></pre>
<h2 id="heading-3-not-validating-origin">3. Not validating origin</h2>
<p>Origin mismatch is a common deployment issue.</p>
<p>If your production URL differs from development configuration, authentication breaks.</p>
<h3 id="heading-example-2">Example:</h3>
<p>Your production:</p>
<pre><code class="lang-csharp">https:<span class="hljs-comment">//app.yourdomain.com</span>
</code></pre>
<p>But config says:</p>
<pre><code class="lang-csharp">Origin = <span class="hljs-string">"https://yourdomain.com"</span>
</code></pre>
<p>Subdomain mismatch would lead to this error:</p>
<pre><code class="lang-csharp">Fido2VerificationException: Invalid origin
</code></pre>
<p>Or:</p>
<pre><code class="lang-csharp">Origin https:<span class="hljs-comment">//app.yourdomain.com does not match expected https://yourdomain.com</span>
</code></pre>
<h2 id="heading-4-counter-mishandling">4. Counter mishandling</h2>
<p>Some authenticators:</p>
<ul>
<li><p>return 0 initially.</p>
</li>
<li><p>do not increment as expected.</p>
</li>
</ul>
<p>Your logic must handle legitimate zero counters.</p>
<p>Rejecting zero blindly causes user lockout.</p>
<h3 id="heading-example-3">Example</h3>
<p>Authenticator returns:</p>
<pre><code class="lang-javascript">counter = <span class="hljs-number">0</span>
</code></pre>
<p>Stored counter also:</p>
<pre><code class="lang-sql">0
</code></pre>
<p>Your logic:</p>
<pre><code class="lang-csharp"><span class="hljs-keyword">if</span> (result.Counter &lt;= storedCounter)
{
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> SecurityException(<span class="hljs-string">"Possible cloned authenticator"</span>);
}
</code></pre>
<p>Immediate lockout and return error:</p>
<pre><code class="lang-csharp">Fido2VerificationException: Signature counter did not increase.
</code></pre>
<p>Or your own thrown exception:</p>
<pre><code class="lang-csharp">Possible cloned authenticator detected.
</code></pre>
<p>Correct logic: Only enforce monotonicity when counter &gt; 0.</p>
<h2 id="heading-5-misunderstanding-attestation">5. Misunderstanding attestation</h2>
<p>Attestation verifies device manufacturer.</p>
<p>Most applications do not need this.</p>
<p>Setting <code>AttestationConveyancePreference.None</code>:</p>
<ul>
<li><p>avoids privacy concerns,</p>
</li>
<li><p>reduces complexity,</p>
</li>
<li><p>avoids metadata verification headaches.</p>
</li>
</ul>
<h3 id="heading-example-4">Example:</h3>
<p>You enable:</p>
<pre><code class="lang-csharp">AttestationConveyancePreference.Direct
</code></pre>
<p>Now browser returns full attestation.</p>
<p>But you don’t validate metadata, which would returns:</p>
<pre><code class="lang-csharp">Fido2VerificationException: Attestation format not supported
</code></pre>
<p>Or:</p>
<pre><code class="lang-csharp">Fido2VerificationException: No metadata service configured
</code></pre>
<h2 id="heading-bonus-browser-side-errors">Bonus: Browser-Side Errors</h2>
<h3 id="heading-user-cancels">User Cancels</h3>
<pre><code class="lang-javascript">DOMException: The operation was aborted.
</code></pre>
<h3 id="heading-not-allowed">Not Allowed</h3>
<pre><code class="lang-javascript">DOMException: The user aborted a request.
</code></pre>
<h3 id="heading-unsupported-platform">Unsupported Platform</h3>
<pre><code class="lang-javascript">NotSupportedError: The operation is not supported.
</code></pre>
<p>These are not backend problems — but your UX must handle them gracefully.</p>
<hr />
<h1 id="heading-what-surprised-me-during-implementation">What Surprised Me During Implementation</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771317391660/baa24c64-7482-4f3e-aea6-d4291af80a6b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-1-how-much-state-management-matters">1. How much state management matters</h2>
<p>The cryptography is handled by the library.</p>
<p>The complexity lives in:</p>
<ul>
<li><p>challenge storage,</p>
</li>
<li><p>session lifecycle,</p>
</li>
<li><p>device registration state,</p>
</li>
<li><p>error branching.</p>
</li>
</ul>
<p>WebAuthn is less about math and more about disciplined state handling.</p>
<h2 id="heading-2-browser-inconsistencies">2. Browser inconsistencies</h2>
<p>Different browsers:</p>
<ul>
<li><p>format errors differently,</p>
</li>
<li><p>handle cancellation differently,</p>
</li>
<li><p>vary in UI timing.</p>
</li>
</ul>
<p>Your retry UX must account for that.</p>
<h2 id="heading-3-the-importance-of-fallback">3. The importance of fallback</h2>
<p>The first time a device:</p>
<ul>
<li><p>failed biometric recognition,</p>
</li>
<li><p>or returned unexpected counter values,</p>
</li>
</ul>
<p>I realized:</p>
<p>Passwordless-only systems are fragile.</p>
<p>Fallback is not optional.</p>
<h2 id="heading-4-offline-expectations-vs-reality">4. Offline expectations vs reality</h2>
<p>Because this is a PWA, users assume:</p>
<p>“It’s installed. It should just work.”</p>
<p>But WebAuthn requires:</p>
<ul>
<li><p>live challenge from server,</p>
</li>
<li><p>real-time verification.</p>
</li>
</ul>
<p>Offline login is not true authentication.</p>
<p>Designing expectations around that was essential.</p>
<h2 id="heading-5-the-psychological-difference">5. The psychological difference</h2>
<p>Once implemented properly:</p>
<p>Users stopped typing passwords.</p>
<p>They trusted the system more.</p>
<p>That was not because of UI polish.</p>
<p>It was because:</p>
<ul>
<li><p>no secrets were transmitted,</p>
</li>
<li><p>no reset emails were needed,</p>
</li>
<li><p>no password rules existed.</p>
</li>
</ul>
<p>Security felt natural.</p>
<p>That is rare.</p>
<hr />
<h1 id="heading-final-reflection">Final Reflection</h1>
<p>Implementing WebAuthn is not:</p>
<ul>
<li><p>copying code from documentation,</p>
</li>
<li><p>adding biometric login,</p>
</li>
<li><p>or flipping a feature flag.</p>
</li>
</ul>
<p>It is:</p>
<ul>
<li><p>modeling credentials correctly,</p>
</li>
<li><p>handling state carefully,</p>
</li>
<li><p>validating challenges strictly,</p>
</li>
<li><p>updating counters reliably,</p>
</li>
<li><p>integrating session management securely.</p>
</li>
</ul>
<p>It is architecture expressed through code.</p>
<p>In the next article, we’ll examine the integration of Feide OIDC in more depth — including account linking, token validation, and how federated identity interacts with my passwordless credential lifecycle.</p>
<p>Because WebAuthn proves possession.</p>
<p>Federation proves identity continuity.</p>
<p>Both are required for resilient systems.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p>→ <strong>Article 8 — Implementing WebAuthn in Practice</strong></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Passwordless PWA Flow Architecture Walkthrough]]></title><description><![CDATA[Modern authentication diagrams are clean.
Real systems are not.
My architecture intentionally combines:

WebAuthn (FIDO2) for phishing-resistant authentication

Feide (OIDC) for federated identity, recovery, and bootstrap

SQL Server for credential p...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough</guid><category><![CDATA[fido2-net-lib]]></category><category><![CDATA[Feide]]></category><category><![CDATA[HTTP-only Cookies]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[#webauthn]]></category><category><![CDATA[#fido2]]></category><category><![CDATA[OpenID Connect]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[Vue.js]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[Authentication Architecture]]></category><category><![CDATA[asp.net core]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Mon, 16 Feb 2026 11:05:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771239227385/5175a271-2ff0-49c4-a694-3542226658fd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Modern authentication diagrams are clean.</p>
<p>Real systems are not.</p>
<p>My architecture intentionally combines:</p>
<ul>
<li><p>WebAuthn (FIDO2) for phishing-resistant authentication</p>
</li>
<li><p>Feide (OIDC) for federated identity, recovery, and bootstrap</p>
</li>
<li><p>SQL Server for credential persistence</p>
</li>
<li><p>HTTP-only cookies for secure session handling</p>
</li>
<li><p>VueJS PWA as the user-facing layer</p>
</li>
</ul>
<p>At the center of the system is one key decision:</p>
<blockquote>
<p>Does this user already have passwordless enabled?</p>
</blockquote>
<p>Everything branches from there.</p>
<h3 id="heading-disclaimer">Disclaimer</h3>
<p><em>This article describes architectural patterns and technical approaches based on a real-world implementation. All examples, code snippets, and flow descriptions have been generalized and simplified for educational purposes. No proprietary business logic, confidential configurations, credentials, or organization-specific details are disclosed. The focus is strictly on publicly documented standards (WebAuthn, OIDC) and implementation patterns within a standard VueJS + ASP.NET Core + SQL Server stack.</em></p>
<hr />
<h1 id="heading-the-real-flowchart-the-system-as-a-decision-tree">The Real Flowchart: The System as a Decision Tree</h1>
<p>My initial flowchart expresses the core logic clearly:</p>
<ol>
<li><p>User requests authentication.</p>
</li>
<li><p>System checks: Has passwordless been enabled?</p>
</li>
<li><p>If yes → Attempt WebAuthn authentication.</p>
</li>
<li><p>If no → Redirect to Feide OIDC.</p>
</li>
<li><p>If WebAuthn fails → Allow retry.</p>
</li>
<li><p>If retries exhausted → End or fallback.</p>
</li>
<li><p>After successful OIDC → Offer passwordless registration.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771239027313/293700cd-6bb8-4230-99eb-8872aca5f56d.png" alt class="image--center mx-auto" /></p>
<p>This is not UX decoration.</p>
<p>This is an explicit trust state machine.</p>
<p>Let’s walk through it step by step with real code.</p>
<hr />
<h1 id="heading-step-1-vuejs-pwa-begin-authentication">Step 1 — VueJS PWA: Begin Authentication</h1>
<p>The PWA does not guess the strategy.</p>
<p>It asks the backend.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// VueJS (Composition API)</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">beginLogin</span>(<span class="hljs-params">email</span>) </span>{
  <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/auth/begin"</span>, {
    <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
    <span class="hljs-attr">headers</span>: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
    <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify({ email })
  });

  <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> response.json();

  <span class="hljs-keyword">if</span> (result.strategy === <span class="hljs-string">"webauthn"</span>) {
    <span class="hljs-keyword">await</span> startWebAuthn(result.options);
  } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (result.strategy === <span class="hljs-string">"oidc"</span>) {
    <span class="hljs-built_in">window</span>.location.href = result.redirectUrl;
  }
}
</code></pre>
<p>The browser is a mediator.<br />It does not decide trust.</p>
<hr />
<h1 id="heading-step-2-aspnet-core-decide-the-strategy">Step 2 — ASP.NET Core: Decide the Strategy</h1>
<p>My backend controls the trust graph.</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpPost(<span class="hljs-meta-string">"begin"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;IActionResult&gt; <span class="hljs-title">Begin</span>(<span class="hljs-params">[FromBody] LoginRequest request</span>)</span>
{
    <span class="hljs-keyword">var</span> user = <span class="hljs-keyword">await</span> _db.Users
        .Include(u =&gt; u.Credentials)
        .FirstOrDefaultAsync(u =&gt; u.Email == request.Email);

    <span class="hljs-keyword">if</span> (user == <span class="hljs-literal">null</span> || !user.Credentials.Any())
    {
        <span class="hljs-keyword">return</span> Ok(<span class="hljs-keyword">new</span> {
            strategy = <span class="hljs-string">"oidc"</span>,
            redirectUrl = BuildFeideRedirect()
        });
    }

    <span class="hljs-keyword">var</span> fido2 = <span class="hljs-keyword">new</span> Fido2(<span class="hljs-keyword">new</span> Fido2Configuration
    {
        ServerDomain = <span class="hljs-string">"yourdomain.com"</span>,
        ServerName = <span class="hljs-string">"Your App"</span>,
        Origin = <span class="hljs-string">"https://yourdomain.com"</span>
    });

    <span class="hljs-keyword">var</span> options = fido2.GetAssertionOptions(
        user.Credentials.Select(c =&gt; <span class="hljs-keyword">new</span> PublicKeyCredentialDescriptor(c.CredentialId)).ToList(),
        UserVerificationRequirement.Preferred
    );

    HttpContext.Session.SetString(<span class="hljs-string">"fido2.challenge"</span>, options.Challenge);

    <span class="hljs-keyword">return</span> Ok(<span class="hljs-keyword">new</span> {
        strategy = <span class="hljs-string">"webauthn"</span>,
        options
    });
}
</code></pre>
<p>Why this branch exists:</p>
<ul>
<li><p>WebAuthn only works if credentials exist.</p>
</li>
<li><p>Backend must know account state.</p>
</li>
<li><p>Trust decisions cannot be client-side.</p>
</li>
</ul>
<hr />
<h1 id="heading-step-3-webauthn-authentication-vuejs-browser-api">Step 3 — WebAuthn Authentication (VueJS + Browser API)</h1>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">startWebAuthn</span>(<span class="hljs-params">options</span>) </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> assertion = <span class="hljs-keyword">await</span> navigator.credentials.get({
      <span class="hljs-attr">publicKey</span>: options
    });

    <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"/api/auth/verify-webauthn"</span>, {
      <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
      <span class="hljs-attr">headers</span>: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
      <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify(assertion)
    });

    <span class="hljs-keyword">if</span> (res.ok) {
      <span class="hljs-built_in">window</span>.location.href = <span class="hljs-string">"/dashboard"</span>;
    } <span class="hljs-keyword">else</span> {
      showRetryOption();
    }
  } <span class="hljs-keyword">catch</span> (err) {
    showRetryOption();
  }
}
</code></pre>
<p>The browser enforces:</p>
<ul>
<li><p>Origin binding</p>
</li>
<li><p>Authenticator interaction</p>
</li>
<li><p>User presence / verification</p>
</li>
</ul>
<p>It does not verify the signature.</p>
<p>That’s backend responsibility.</p>
<hr />
<h1 id="heading-step-4-backend-verification-with-fido2-net-lib">Step 4 — Backend Verification with fido2-net-lib</h1>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpPost(<span class="hljs-meta-string">"verify-webauthn"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;IActionResult&gt; <span class="hljs-title">Verify</span>(<span class="hljs-params">[FromBody] AuthenticatorAssertionRawResponse clientResponse</span>)</span>
{
    <span class="hljs-keyword">var</span> challenge = HttpContext.Session.GetString(<span class="hljs-string">"fido2.challenge"</span>);

    <span class="hljs-keyword">var</span> storedCredential = <span class="hljs-keyword">await</span> _db.Credentials
        .FirstOrDefaultAsync(c =&gt; c.CredentialId == clientResponse.Id);

    <span class="hljs-keyword">if</span> (storedCredential == <span class="hljs-literal">null</span>)
        <span class="hljs-keyword">return</span> Unauthorized();

    <span class="hljs-keyword">var</span> fido2 = <span class="hljs-keyword">new</span> Fido2(_config);

    <span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> fido2.MakeAssertionAsync(
        clientResponse,
        storedCredential.PublicKey,
        storedCredential.SignatureCounter,
        args =&gt; args.Challenge == challenge
    );

    storedCredential.SignatureCounter = result.Counter;
    <span class="hljs-keyword">await</span> _db.SaveChangesAsync();

    SignInUser(storedCredential.UserId);

    <span class="hljs-keyword">return</span> Ok();
}
</code></pre>
<p>This branch exists because:</p>
<ul>
<li><p>Only the server verifies cryptographic proof.</p>
</li>
<li><p>Counters detect cloned authenticators.</p>
</li>
<li><p>Session issuance must be server-controlled.</p>
</li>
</ul>
<hr />
<h1 id="heading-session-handling-http-only-cookie">Session Handling — HTTP-Only Cookie</h1>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">SignInUser</span>(<span class="hljs-params">Guid userId</span>)</span>
{
    <span class="hljs-keyword">var</span> claims = <span class="hljs-keyword">new</span> List&lt;Claim&gt;
    {
        <span class="hljs-keyword">new</span> Claim(ClaimTypes.NameIdentifier, userId.ToString())
    };

    <span class="hljs-keyword">var</span> identity = <span class="hljs-keyword">new</span> ClaimsIdentity(claims, <span class="hljs-string">"Cookies"</span>);
    <span class="hljs-keyword">var</span> principal = <span class="hljs-keyword">new</span> ClaimsPrincipal(identity);

    HttpContext.SignInAsync(<span class="hljs-string">"Cookies"</span>, principal, <span class="hljs-keyword">new</span> AuthenticationProperties
    {
        IsPersistent = <span class="hljs-literal">true</span>,
        ExpiresUtc = DateTime.UtcNow.AddHours(<span class="hljs-number">8</span>)
    });
}
</code></pre>
<p>Configured in <code>Startup.cs</code>:</p>
<pre><code class="lang-csharp">services.AddAuthentication(<span class="hljs-string">"Cookies"</span>)
    .AddCookie(<span class="hljs-string">"Cookies"</span>, options =&gt;
    {
        options.Cookie.HttpOnly = <span class="hljs-literal">true</span>;
        options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
        options.Cookie.SameSite = SameSiteMode.Lax;
    });
</code></pre>
<p>Why HTTP-only cookie?</p>
<ul>
<li><p>Protects against XSS token theft.</p>
</li>
<li><p>Avoids storing JWT in localStorage.</p>
</li>
<li><p>Keeps session server-controlled.</p>
</li>
</ul>
<hr />
<h1 id="heading-oidc-fallback-feide-integration">OIDC Fallback — Feide Integration</h1>
<p>If passwordless is not enabled, redirect to Feide.</p>
<pre><code class="lang-csharp"><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">string</span> <span class="hljs-title">BuildFeideRedirect</span>(<span class="hljs-params"></span>)</span>
{
    <span class="hljs-keyword">return</span> <span class="hljs-string">$"<span class="hljs-subst">{_config[<span class="hljs-string">"Feide:Authority"</span>]}</span>/authorize"</span> +
           <span class="hljs-string">$"?response_type=code"</span> +
           <span class="hljs-string">$"&amp;client_id=<span class="hljs-subst">{_config[<span class="hljs-string">"Feide:ClientId"</span>]}</span>"</span> +
           <span class="hljs-string">$"&amp;redirect_uri=<span class="hljs-subst">{_config[<span class="hljs-string">"Feide:RedirectUri"</span>]}</span>"</span> +
           <span class="hljs-string">$"&amp;scope=openid profile email"</span> +
           <span class="hljs-string">$"&amp;code_challenge=<span class="hljs-subst">{GeneratePKCEChallenge()}</span>"</span> +
           <span class="hljs-string">$"&amp;code_challenge_method=S256"</span>;
}
</code></pre>
<p>Callback endpoint:</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpGet(<span class="hljs-meta-string">"callback"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">async</span> Task&lt;IActionResult&gt; <span class="hljs-title">Callback</span>(<span class="hljs-params"><span class="hljs-keyword">string</span> code</span>)</span>
{
    <span class="hljs-keyword">var</span> token = <span class="hljs-keyword">await</span> ExchangeCodeForToken(code);

    <span class="hljs-keyword">var</span> idToken = ValidateIdToken(token.IdToken);

    <span class="hljs-keyword">var</span> user = <span class="hljs-keyword">await</span> FindOrCreateUser(idToken.Sub);

    SignInUser(user.Id);

    <span class="hljs-keyword">if</span> (!user.Credentials.Any())
        <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/enable-passwordless"</span>);

    <span class="hljs-keyword">return</span> Redirect(<span class="hljs-string">"/dashboard"</span>);
}
</code></pre>
<p>This branch exists because:</p>
<ul>
<li><p>Devices are lost.</p>
</li>
<li><p>Users switch devices.</p>
</li>
<li><p>Federation provides lifecycle continuity.</p>
</li>
<li><p>OIDC provides bootstrap trust.</p>
</li>
</ul>
<hr />
<h1 id="heading-webauthn-registration-after-oidc">WebAuthn Registration After OIDC</h1>
<p>When enabling passwordless:</p>
<pre><code class="lang-csharp">[<span class="hljs-meta">HttpPost(<span class="hljs-meta-string">"register-options"</span>)</span>]
<span class="hljs-function"><span class="hljs-keyword">public</span> IActionResult <span class="hljs-title">RegisterOptions</span>(<span class="hljs-params"></span>)</span>
{
    <span class="hljs-keyword">var</span> user = GetCurrentUser();

    <span class="hljs-keyword">var</span> options = _fido2.RequestNewCredential(
        <span class="hljs-keyword">new</span> Fido2User
        {
            DisplayName = user.Email,
            Id = Encoding.UTF8.GetBytes(user.Id.ToString()),
            Name = user.Email
        },
        <span class="hljs-keyword">new</span> List&lt;PublicKeyCredentialDescriptor&gt;(),
        AuthenticatorSelection.Default,
        AttestationConveyancePreference.None
    );

    HttpContext.Session.SetString(<span class="hljs-string">"fido2.attestationChallenge"</span>, options.Challenge);

    <span class="hljs-keyword">return</span> Ok(options);
}
</code></pre>
<p>Client registers via <code>navigator.credentials.create</code>.</p>
<p>Server verifies and stores:</p>
<pre><code class="lang-csharp">_db.Credentials.Add(<span class="hljs-keyword">new</span> Credential {
    UserId = user.Id,
    CredentialId = result.Result.CredentialId,
    PublicKey = result.Result.PublicKey,
    SignatureCounter = result.Result.Counter
});
</code></pre>
<p>Why this branch exists:</p>
<ul>
<li><p>Passwordless-first upgrades users.</p>
</li>
<li><p>Registration is explicit.</p>
</li>
<li><p>Device lifecycle is managed.</p>
</li>
</ul>
<hr />
<h1 id="heading-role-breakdown">Role Breakdown</h1>
<h3 id="heading-browser-vuejs-pwa">Browser (VueJS PWA)</h3>
<ul>
<li><p>Initiates flows</p>
</li>
<li><p>Calls WebAuthn API</p>
</li>
<li><p>Handles redirects</p>
</li>
<li><p>Does not store session tokens</p>
</li>
</ul>
<h3 id="heading-authenticator">Authenticator</h3>
<ul>
<li><p>Stores private key</p>
</li>
<li><p>Verifies biometric locally</p>
</li>
<li><p>Signs challenges</p>
</li>
<li><p>Never exposes key</p>
</li>
</ul>
<h3 id="heading-backend-net-core">Backend (.NET Core)</h3>
<ul>
<li><p>Controls strategy</p>
</li>
<li><p>Generates challenges</p>
</li>
<li><p>Verifies assertions</p>
</li>
<li><p>Tracks counters</p>
</li>
<li><p>Integrates OIDC</p>
</li>
<li><p>Issues session cookie</p>
</li>
<li><p>Persists credentials in SQL Server</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771239572210/e9813a80-1065-4d01-af47-fe0abd3be62b.png" alt class="image--center mx-auto" /></p>
<p>Trust is centralized.<br />Proof is decentralized.</p>
<hr />
<h1 id="heading-why-each-branch-exists">Why Each Branch Exists</h1>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Branch</strong></td><td><strong>Real-World Reason</strong></td></tr>
</thead>
<tbody>
<tr>
<td>WebAuthn first</td><td>Phishing-resistant primary auth</td></tr>
<tr>
<td>OIDC fallback</td><td>Recovery + cross-device bootstrap</td></tr>
<tr>
<td>Retry WebAuthn</td><td>Biometric glitches happen</td></tr>
<tr>
<td>Registration after OIDC</td><td>Upgrade path to passwordless</td></tr>
<tr>
<td>HTTP-only session cookie</td><td>Protect against XSS token theft</td></tr>
<tr>
<td>Counter tracking</td><td>Detect cloned authenticators</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771239807725/61c028d2-2704-4583-beab-c5dd8f636999.png" alt class="image--center mx-auto" /></p>
<p>None of these branches are decorative.</p>
<p>Each corresponds to a failure mode in reality.</p>
<hr />
<h1 id="heading-final-architectural-insight">Final Architectural Insight</h1>
<p>This system is not:</p>
<p>“Biometric login.”</p>
<p>It is:</p>
<ul>
<li><p>Identity bootstrap via federation.</p>
</li>
<li><p>Device-bound authentication via FIDO2.</p>
</li>
<li><p>Session integrity via secure cookies.</p>
</li>
<li><p>Lifecycle management via SQL persistence.</p>
</li>
<li><p>Explicit failure handling.</p>
</li>
<li><p>Clear decision tree.</p>
</li>
</ul>
<p>Passwordless-first is not about removing complexity.</p>
<p>It is about relocating trust:</p>
<ul>
<li><p>Away from shared secrets.</p>
</li>
<li><p>Toward cryptographic proof.</p>
</li>
<li><p>While preserving federated continuity.</p>
</li>
</ul>
<p>And when drawn as a flowchart, the system looks clean.</p>
<p>When implemented in VueJS + ASP.NET Core + SQL Server + Feide + fido2-net-lib, it becomes real.</p>
<p>And real systems are where architecture proves itself.</p>
<p>Next article, we’ll explore what broke, what surprised us, and what we learned when this passwordless-first architecture moved from diagram to production.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model"><strong>Article 6 — UX and Failure Are Part of the Security Model</strong></a></p>
</li>
<li><p>→ <strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[UX and Failure Are Part of the Security Model]]></title><description><![CDATA[Security engineers love cryptography because it is clean.
Humans are not.
The strongest authentication protocol in the world can be undone by:

a confusing error message,

an unclear retry flow,

a missing recovery path,

or a user who simply wants t...]]></description><link>https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model</link><guid isPermaLink="true">https://devpath-traveler.nguyenviettung.id.vn/ux-and-failure-are-part-of-the-security-model</guid><category><![CDATA[Authentication UX]]></category><category><![CDATA[Identity Recovery]]></category><category><![CDATA[Multi-Device Authentication]]></category><category><![CDATA[Security Design]]></category><category><![CDATA[passwordless authentication ]]></category><category><![CDATA[#webauthn]]></category><category><![CDATA[OpenID Connect]]></category><category><![CDATA[progressive web apps]]></category><category><![CDATA[Application Security]]></category><category><![CDATA[user experience]]></category><dc:creator><![CDATA[Nguyễn Việt Tùng]]></dc:creator><pubDate>Sun, 15 Feb 2026 03:09:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771122764029/eb2b4cfc-7db5-47c7-89d4-a6ddddc5d1d7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Security engineers love cryptography because it is clean.</p>
<p>Humans are not.</p>
<p>The strongest authentication protocol in the world can be undone by:</p>
<ul>
<li><p>a confusing error message,</p>
</li>
<li><p>an unclear retry flow,</p>
</li>
<li><p>a missing recovery path,</p>
</li>
<li><p>or a user who simply wants to get their work done.</p>
</li>
</ul>
<p>If your authentication design does not account for failure as a first-class scenario, it is not secure — it is brittle.</p>
<p>Passwordless systems amplify this truth.</p>
<p>When WebAuthn works, it feels effortless.<br />When it fails, it reveals whether your architecture was designed for real life or for a demo.</p>
<p>UX is not decoration layered on top of security.<br />UX is how security expresses itself.</p>
<hr />
<h2 id="heading-retry-flows-are-part-of-the-threat-model">Retry flows are part of the threat model</h2>
<p>Consider a simple scenario:</p>
<p>A user attempts WebAuthn authentication.<br />They cancel the prompt.</p>
<p>What does that mean?</p>
<ul>
<li><p>They changed their mind?</p>
</li>
<li><p>The biometric failed?</p>
</li>
<li><p>The authenticator was unavailable?</p>
</li>
<li><p>A malicious script attempted background authentication?</p>
</li>
<li><p>They clicked too quickly?</p>
</li>
</ul>
<p>Your system must interpret failure deliberately.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771123104756/638acb36-c76c-4640-adb7-fdc3846265dd.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-immediate-retry">Immediate retry?</h3>
<p>Too many automatic retries can:</p>
<ul>
<li><p>confuse users,</p>
</li>
<li><p>create loops,</p>
</li>
<li><p>mask real issues.</p>
</li>
</ul>
<h3 id="heading-manual-retry">Manual retry?</h3>
<p>Clear, explicit retry buttons give users control — and reduce panic.</p>
<h3 id="heading-escalation-to-fallback">Escalation to fallback?</h3>
<p>At what point does the system say:<br />“Let’s use your identity provider instead”?</p>
<p>Retry logic is not UX polish.<br />It is part of the attack surface.</p>
<p>An attacker probing authentication flows will:</p>
<ul>
<li><p>trigger errors,</p>
</li>
<li><p>observe timing differences,</p>
</li>
<li><p>test fallback conditions.</p>
</li>
</ul>
<p>Your retry model must:</p>
<ul>
<li><p>avoid leaking information,</p>
</li>
<li><p>avoid enabling brute force,</p>
</li>
<li><p>avoid trapping legitimate users.</p>
</li>
</ul>
<hr />
<h2 id="heading-lockouts-protection-or-punishment">Lockouts: protection or punishment?</h2>
<p>Lockouts are traditionally used to prevent brute-force attacks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771123509538/d6960af8-f555-473b-b8a3-9146b7c44c65.png" alt class="image--center mx-auto" /></p>
<p>But in passwordless systems:</p>
<ul>
<li><p>there is no password to brute force,</p>
</li>
<li><p>biometric verification happens locally,</p>
</li>
<li><p>challenge–response is resistant to replay.</p>
</li>
</ul>
<p>So what are we locking out?</p>
<p>If:</p>
<ul>
<li><p>a signature counter mismatch occurs,</p>
</li>
<li><p>an authenticator appears cloned,</p>
</li>
<li><p>repeated failures happen,</p>
</li>
</ul>
<p>a lockout might be justified.</p>
<p>But lockouts must be:</p>
<ul>
<li><p>transparent,</p>
</li>
<li><p>recoverable,</p>
</li>
<li><p>tied to real risk signals.</p>
</li>
</ul>
<p>Otherwise, they punish legitimate users for:</p>
<ul>
<li><p>device glitches,</p>
</li>
<li><p>browser inconsistencies,</p>
</li>
<li><p>OS updates,</p>
</li>
<li><p>or simply aging hardware.</p>
</li>
</ul>
<p>A mature system distinguishes between:</p>
<ul>
<li><p>suspicious activity,</p>
</li>
<li><p>normal friction.</p>
</li>
</ul>
<p>Graceful degradation is more secure than aggressive rejection.</p>
<hr />
<h2 id="heading-multi-device-reality">Multi-device reality</h2>
<p>Real users do not live on a single device.</p>
<p>They:</p>
<ul>
<li><p>switch between phone and laptop,</p>
</li>
<li><p>replace hardware every few years,</p>
</li>
<li><p>clear browsers,</p>
</li>
<li><p>use shared or managed devices.</p>
</li>
</ul>
<p>A passwordless-first system must assume:</p>
<ul>
<li><p>multiple credentials per account,</p>
</li>
<li><p>multiple authenticators per user,</p>
</li>
<li><p>credentials that appear and disappear over time.</p>
</li>
</ul>
<p>This changes UX expectations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771123889379/f8167111-cb72-4d84-85a4-9ff7a95f98af.png" alt class="image--center mx-auto" /></p>
<p>When a user logs in from a new device, the system should:</p>
<ul>
<li><p>not imply something is wrong,</p>
</li>
<li><p>guide them through identity verification,</p>
</li>
<li><p>allow secure credential registration.</p>
</li>
</ul>
<p>Multi-device support is not optional.<br />It is the default human condition.</p>
<hr />
<h2 id="heading-lost-device-scenarios-are-inevitable">Lost device scenarios are inevitable</h2>
<p>The most dangerous authentication system is one that assumes users will never lose access to their authenticators.</p>
<p>Phones are lost.<br />Laptops are stolen.<br />Security keys are misplaced.</p>
<p>If your system has no structured recovery path, users will demand one — and you will implement it under pressure.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771124119965/33e5d747-ba46-4e76-9781-2b5413af97e1.png" alt class="image--center mx-auto" /></p>
<p>Good recovery design includes:</p>
<ol>
<li><p>A trusted bootstrap identity method (e.g., OIDC).</p>
</li>
<li><p>Clear verification steps.</p>
</li>
<li><p>Revocation of lost credentials.</p>
</li>
<li><p>Controlled registration of new credentials.</p>
</li>
<li><p>Audit visibility for the user.</p>
</li>
</ol>
<p>Recovery must be:</p>
<ul>
<li><p>secure,</p>
</li>
<li><p>observable,</p>
</li>
<li><p>friction-aware.</p>
</li>
</ul>
<p>Security questions are not recovery.<br />Email-only resets are not recovery.<br />Administrative override is not recovery.</p>
<p>Federated identity exists partly to solve this lifecycle problem.</p>
<hr />
<h2 id="heading-why-fallback-is-not-a-weakness">Why fallback is not a weakness</h2>
<p>There is a persistent misconception:</p>
<p>“If the system falls back to another method, it weakens security.”</p>
<p>This is only true if fallback is poorly designed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771124312723/51066a65-5fd3-4546-a9d8-55bb19a5f5f5.png" alt class="image--center mx-auto" /></p>
<p>Fallback becomes dangerous when it:</p>
<ul>
<li><p>bypasses primary controls,</p>
</li>
<li><p>uses weaker authentication without policy,</p>
</li>
<li><p>exists only as an emergency hack.</p>
</li>
</ul>
<p>Fallback becomes strong when it:</p>
<ul>
<li><p>is part of the architecture,</p>
</li>
<li><p>requires equivalent assurance,</p>
</li>
<li><p>is auditable,</p>
</li>
<li><p>is rate-limited,</p>
</li>
<li><p>and does not undermine the trust model.</p>
</li>
</ul>
<p>In passwordless-first systems:</p>
<p>WebAuthn provides:</p>
<ul>
<li>phishing-resistant, device-bound authentication.</li>
</ul>
<p>OIDC provides:</p>
<ul>
<li><p>identity portability,</p>
</li>
<li><p>lifecycle continuity,</p>
</li>
<li><p>bootstrap trust.</p>
</li>
</ul>
<p>They are not substitutes.<br />They are complementary trust anchors.</p>
<p>The presence of fallback does not weaken security.<br />Unplanned fallback does.</p>
<hr />
<h2 id="heading-graceful-degradation-is-a-security-feature">Graceful degradation is a security feature</h2>
<p>Graceful degradation means:</p>
<p>If the optimal path fails,<br />the system degrades to a slightly less optimal but still secure path —<br />without chaos.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771124592093/21fb5c5f-c290-4812-9181-85a21691ba48.png" alt class="image--center mx-auto" /></p>
<p>For example:</p>
<ul>
<li><p>WebAuthn unavailable → redirect to OIDC.</p>
</li>
<li><p>OIDC temporarily down → delay login with clear messaging.</p>
</li>
<li><p>Authenticator counter mismatch → require identity re-verification.</p>
</li>
</ul>
<p>The goal is not uninterrupted access at any cost.<br />The goal is continuity of trust.</p>
<p>Users interpret friction differently depending on clarity.</p>
<p>An unexplained failure feels insecure.<br />A clearly communicated alternative feels safe.</p>
<hr />
<h2 id="heading-ux-decisions-shape-security-outcomes">UX decisions shape security outcomes</h2>
<p>A confusing biometric prompt can cause:</p>
<ul>
<li><p>users to disable security features,</p>
</li>
<li><p>users to choose weaker alternatives,</p>
</li>
<li><p>users to distrust the system.</p>
</li>
</ul>
<p>An unclear fallback path can cause:</p>
<ul>
<li><p>support overload,</p>
</li>
<li><p>ad hoc account resets,</p>
</li>
<li><p>insecure manual overrides.</p>
</li>
</ul>
<p>Every prompt, error message, and redirect is part of the security boundary.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771124896501/ae624f28-4e74-4459-ba71-6890648b14ad.png" alt class="image--center mx-auto" /></p>
<p>When designing authentication UX, ask:</p>
<ul>
<li><p>Does this flow reduce ambiguity?</p>
</li>
<li><p>Does this error explain next steps?</p>
</li>
<li><p>Does this retry loop prevent confusion?</p>
</li>
<li><p>Does this fallback preserve assurance?</p>
</li>
</ul>
<p>Security is not just cryptographic strength.<br />It is user confidence combined with protocol integrity.</p>
<hr />
<h2 id="heading-designing-for-failure-makes-systems-stronger">Designing for failure makes systems stronger</h2>
<p>Authentication is not about proving success.<br />It is about handling failure safely.</p>
<p>Passwordless-first systems that ignore failure scenarios:</p>
<ul>
<li><p>look elegant in diagrams,</p>
</li>
<li><p>collapse under edge cases,</p>
</li>
<li><p>generate emergency workarounds.</p>
</li>
</ul>
<p>Passwordless-first systems that embrace failure:</p>
<ul>
<li><p>define fallback clearly,</p>
</li>
<li><p>support multi-device reality,</p>
</li>
<li><p>structure recovery intentionally,</p>
</li>
<li><p>treat UX as part of the threat model.</p>
</li>
</ul>
<p>That is the difference between a feature and an architecture.</p>
<p>In the next phase of this series, we move from theory to a real implementation — walking through a complete PWA authentication flow that combines WebAuthn and OpenID Connect in production.</p>
<p>Because architecture only proves itself when it survives the unpredictable behavior of actual users.</p>
<hr />
<h2 id="heading-series-navigation">☰ Series Navigation</h2>
<h3 id="heading-core-series">Core Series</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/introduction-to-passwordless-modern-authentication-patterns-for-pwas">Introduction</a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/authentication-is-not-login"><strong>Article 1 — Authentication is Not Login</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/what-passwordless-actually-means"><strong>Article 2 — What “Passwordless” Actually Means</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/webauthn-and-fido2-explained-without-the-spec"><strong>Article 3 — WebAuthn &amp; FIDO2, Explained Without the Spec</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/openid-connect-as-the-glue"><strong>Article 4 — OpenID Connect as the Glue</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/designing-a-passwordless-first-pwa-architecture"><strong>Article 5 — Designing a Passwordless-First PWA Architecture</strong></a></p>
</li>
<li><p>→ <strong>Article 6 — UX and Failure Are Part of the Security Model</strong></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-pwa-flow-architecture-walkthrough"><strong>Article 7 — A Real Passwordless PWA Flow (Architecture Walkthrough)</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/implementing-webauthn-in-practice"><strong>Article 8 — Implementing WebAuthn in Practice</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/integrating-oidc-feide-as-fallback-and-recovery"><strong>Article 9 — Integrating OIDC (Feide) as Fallback and Recovery</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/passwordless-what-worked-what-didnt-what-id-change"><strong>Article 10 — What Worked, What Didn’t, What I’d Change</strong></a></p>
</li>
</ul>
<h3 id="heading-optional-extras">Optional Extras</h3>
<ul>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/why-passwordless-alone-is-not-an-identity-strategy"><strong>Why Passwordless Alone Is Not an Identity Strategy</strong></a></p>
</li>
<li><p><a target="_blank" href="https://devpath-traveler.nguyenviettung.id.vn/how-browser-ux-shapes-security-more-than-cryptography"><strong>How Browser UX Shapes Security More Than Cryptography</strong></a></p>
</li>
</ul>
]]></content:encoded></item></channel></rss>