Why `/episerver/health` Failed Behind CMS Security Layers and How We Fixed It

Health check failures are often diagnosed as an infrastructure problem first. The app must be down. The load balancer must be wrong. The probe configuration must be off. But sometimes the endpoint is perfectly fine and the real issue is much closer to the request pipeline.

That was the shape of this fix. The failing route was /episerver/health, and the root problem was not the health check implementation itself. The problem was that the route lived under the same URL space as Optimizely CMS, which meant several layers of custom security logic were treating it like a protected admin endpoint.

Once that happens, a health probe stops looking like an anonymous liveness request and starts looking like an unauthorized visit to /episerver. That is how you end up with redirects, 404 responses, or IP-based rejection on an endpoint that infrastructure expects to stay boring and predictable.

The Failure Mode Was Layered

What made this issue interesting is that there was not one single blocker. The route could be intercepted in multiple places:

cookie-auth redirect logic that challenged anonymous /episerver requests
access-denied handling that returned 404 for protected CMS routes
a custom authorization gate later in the pipeline for CMS and /util traffic
an admin IP safelist middleware that restricted /episerver access

That is the key lesson. When a health endpoint shares a route prefix with protected application surfaces, fixing only one interception point is rarely enough.

Why the Route Was Misclassified

The application had custom logic that intentionally treated requests under /episerver and /util differently from public site traffic. That makes sense for an Optimizely solution with Azure B2C sign-in, CMS roles, and safelisted admin access.

The problem was that /episerver/health also matched the broad /episerver prefix checks. So even though the endpoint existed to answer platform probes, it was accidentally pulled into admin-style authentication and authorization handling.

In other words, the route was not broken because the app could not answer a health request. It was broken because the app never let the request arrive at the health endpoint cleanly.

Fix 1: Exclude the Health Route From Cookie Redirect Logic

The first place to carve out the endpoint was the application cookie events. Anonymous access to CMS routes was configured to trigger a B2C challenge, which is good for the admin UI and terrible for a health probe.

options.Events = new CookieAuthenticationEvents
{
    OnRedirectToLogin = ctx =>
    {
        var isHealthCheck = ctx.Request.Path
            .StartsWithSegments("/episerver/health", StringComparison.OrdinalIgnoreCase);

        if (!isHealthCheck &&
            (ctx.Request.Path.StartsWithSegments("/episerver", StringComparison.OrdinalIgnoreCase) ||
             ctx.Request.Path.StartsWithSegments("/util", StringComparison.OrdinalIgnoreCase)))
        {
            var returnUrl = (ctx.Request.PathBase + ctx.Request.Path + ctx.Request.QueryString).ToString();

            return ctx.HttpContext.ChallengeAsync(
                SignInPolicyId,
                new AuthenticationProperties { RedirectUri = returnUrl });
        }

        ctx.Response.Redirect(ctx.RedirectUri);
        return Task.CompletedTask;
    },

The same treatment was applied to the access-denied branch:

OnRedirectToAccessDenied = ctx =>
{
    var isHealthCheck = ctx.Request.Path
        .StartsWithSegments("/episerver/health", StringComparison.OrdinalIgnoreCase);

    if (!isHealthCheck &&
        (ctx.Request.Path.StartsWithSegments("/episerver", StringComparison.OrdinalIgnoreCase) ||
         ctx.Request.Path.StartsWithSegments("/util", StringComparison.OrdinalIgnoreCase)))
    {
        ctx.Response.StatusCode = StatusCodes.Status404NotFound;
        return Task.CompletedTask;
    }

    ctx.Response.Redirect(ctx.RedirectUri);
    return Task.CompletedTask;
};

This is an easy place to miss health endpoints because the logic is technically correct for almost every CMS request. The bug is in the scope of the prefix check, not in the redirect behavior itself.

Fix 2: Exclude the Health Route From the Custom CMS Authorization Gate

Later in the pipeline, the app had another guard for CMS and utility routes. Its job was to return 404 for authenticated users who did not belong to one of the allowed CMS roles. Again, that behavior was perfectly valid for admin traffic and completely wrong for an infrastructure probe.

app.Use(async (ctx, next) =>
{
    bool isHealthCheck = ctx.Request.Path
        .StartsWithSegments("/episerver/health", StringComparison.OrdinalIgnoreCase);
    bool isCms = ctx.Request.Path.StartsWithSegments("/episerver", StringComparison.OrdinalIgnoreCase);
    bool isUtil = ctx.Request.Path.StartsWithSegments("/util", StringComparison.OrdinalIgnoreCase);

    if (!isHealthCheck && (isCms || isUtil))
    {
        if (ctx.User?.Identity?.IsAuthenticated == true)
        {
            bool isAllowed =
                ctx.User.IsInRole("CmsAdmins") ||
                ctx.User.IsInRole("CmsEditors") ||
                ctx.User.IsInRole("WebAdmins") ||
                ctx.User.IsInRole("Administrators");

            if (!isAllowed)
            {
                ctx.Response.StatusCode = StatusCodes.Status404NotFound;
                var epf = ctx.Features.Get<IEndpointFeature>();
                if (epf != null) epf.Endpoint = null;
                return;
            }
        }
    }

    await next();
});

This part matters because even if you fix cookie-auth redirects, a later authorization layer can still suppress the request before the endpoint runs. Health routes need a clean lane through the entire pipeline, not just the first checkpoint.

Fix 3: Exclude the Health Route From Admin IP Safelisting

The final piece was the admin safelist middleware. In production-like environments, requests under /episerver were restricted to approved IPs. That is sensible for the CMS and a common source of confusion for health probes coming from infrastructure addresses you do not want to manage like editorial traffic.

public async Task Invoke(HttpContext context)
{
    var remoteIp = GetRemoteIpAddress(context);
    var path = context.Request.Path.Value ?? string.Empty;

    var isHealthCheckPath = path.Equals("/episerver/health", StringComparison.OrdinalIgnoreCase);

    var isEpiserverPath =
        path.Equals("/episerver", StringComparison.OrdinalIgnoreCase) ||
        path.StartsWith("/episerver/", StringComparison.OrdinalIgnoreCase);

    if (isEpiserverPath && !isHealthCheckPath)
    {
        if (!IsIpAuthorized(remoteIp))
        {
            context.Response.StatusCode = StatusCodes.Status404NotFound;
            return;
        }
    }

    await _next(context);
}

This was probably the most important operational change of the set. A health probe should not depend on the same whitelist assumptions as the CMS shell unless you are very intentionally designing it that way.

The Interesting Part Was the Iteration

One detail I like in the commit history is that the first health-route carve-out was not the final shape of the solution. There was an initial change, a follow-up adjustment, and then a broader pass that fixed the remaining interception points. That is a realistic example of how route-security bugs behave in mature applications.

They rarely live in one method. They live in the composition of several well-meaning layers.

The Design Lesson

If a health endpoint lives under a route prefix that also hosts protected application features, you should assume broad prefix checks will catch it unless you explicitly exempt it. That exemption usually has to be repeated at every layer that can short-circuit the request:

authentication redirects
authorization failure handling
role-based middleware
IP restrictions

The mistake is to think of health checks as “just another endpoint.” Operationally, they are not. They are infrastructure control signals, and they need a simpler path through the app than a CMS shell request.

That is what this fix accomplished. It did not make the health check smarter. It made the surrounding security layers precise enough to stop treating the health check like an admin page.