How We Stopped Deleted CMS Pages from Showing Up in Personalized Recommendations

Personalized recommendation widgets are easy to love when they work and surprisingly damaging when they do not. One broken recommendation does more than create a 404. It tells users that the site does not really know what is live, what is relevant, or what is safe to click.

We recently worked through that exact problem in an ASP.NET Core CMS implementation. A third-party recommendation engine was still returning items whose URLs pointed to deleted or unpublished CMS content. The recommendation API was doing what recommendation APIs often do: returning content based on historical signals. The CMS, meanwhile, had moved on. Editors had unpublished some pages, deleted others, and the front end was still happily rendering those links.

The fix was not to ask the recommendation engine to become a source of truth for content validity. The fix was to make the CMS validate what it was about to render.

The real problem was stale recommendations, not bad rendering

At first glance, this kind of bug looks like a front-end issue. A card shows up, the user clicks it, and the destination is gone. But the real issue lives at the boundary between two systems with different responsibilities:

  • The recommendation service decides what content might be relevant.
  • The CMS decides what content is currently routable and published.

If we skip the second check, we are effectively assuming that recommendation freshness and publication state are always aligned. In practice, they rarely are.

The pattern that fixed it

Instead of trusting every recommended URL, we added a validation step before the recommendation block rendered any item.

The block now does four important things:

  • Deduplicates recommendations by URL.
  • Strips URL fragments so in-page anchors do not create false distinctions.
  • Validates same-site URLs against CMS routing and publication state.
  • Only takes the final number of recommendations after validation succeeds.

That last point matters more than it sounds. If you request exactly six items from the recommendation service and two of them are stale, your widget suddenly has holes. The better pattern is to overfetch, validate, and then trim.

In our case, the recommendation helper now requests at least 30 items or the configured display count, whichever is greater. That gives the block enough inventory to survive dead links without looking empty.

How validation worked in practice

For internal links, we treated the CMS as the authority. If a recommendation pointed back to the same site, we routed the path through the CMS, loaded the content, and made sure the content was still published before rendering the card.

For external URLs, we left them alone. The CMS cannot reliably validate an outside domain, so trying to treat external destinations the same way would create false negatives.

At a high level, the logic looked like this:

[TemplateDescriptor]
public class RecommendedContentBlockComponent : AsyncBlockComponent<IdioRecommendedContentBlock>
{
    private readonly IUrlResolver _urlResolver;
    private readonly IContentLoader _contentLoader;
    private readonly IHttpContextAccessor _httpContextAccessor;

    public RecommendedContentBlockComponent(
        IUrlResolver urlResolver,
        IContentLoader contentLoader,
        IHttpContextAccessor httpContextAccessor)
    {
        _urlResolver = urlResolver;
        _contentLoader = contentLoader;
        _httpContextAccessor = httpContextAccessor;
    }

    protected override async Task<IViewComponentResult> InvokeComponentAsync(IdioRecommendedContentBlock currentContent)
    {
        var recommendations = await IdioApiHelper.GetRecommendationsAsync(currentContent);
        var model = new IdioRecommendedContentBlockViewModel(currentContent)
        {
            Recommendations = recommendations
                ?.content.Select(x => new IdioRecommendation()
                {
                    OgTitle = x?.title ?? x?.metadata?.tags?.og?.title ?? string.Empty,
                    OgDescription = x?.metadata?.tags?.og?.description ?? string.Empty,
                    PageTypeIcon = IdioApiHelper.NormalizeGlobalAssetsUrl(x?.metadata?.tags?.idio?.page_type_icon ?? string.Empty),
                    Url = x?.link_url ?? x?.metadata?.tags?.og?.url ?? string.Empty
                })?.DistinctBy(x => x.Url)
                .Where(r => IsRenderableCmsRecommendation(r.Url))
                .Take(currentContent.NumberOfRecommendations)
                .ToList() ?? new List<IdioRecommendation>()
        };

        return await Task.FromResult(View("~/Features/Common/Blocks/IdioRecommendedContent/IdioRecommendedContentBlock.cshtml", model));
    }

    /// <summary>
    /// Idio can return URLs that no longer exist in the CMS. For same-site links, require a routable,
    /// published content item. External URLs are left unchanged (cannot validate against EPiServer).
    /// </summary>
    private bool IsRenderableCmsRecommendation(string rawUrl)
    {
        if (string.IsNullOrWhiteSpace(rawUrl))
        {
            return false;
        }

        var withoutFragment = rawUrl.Split('#')[0];

        if (!Uri.TryCreate(withoutFragment, UriKind.Absolute, out var absolute))
        {
            return InternalPathResolvesToPublishedContent(withoutFragment);
        }

        var siteHost = GetPrimarySiteHost();
        var requestHost = _httpContextAccessor.HttpContext?.Request.Host.Host;
        if (siteHost != null)
        {
            if (!HostsMatch(absolute.Host, siteHost))
            {
                return true;
            }
        }
        else if (!string.IsNullOrEmpty(requestHost) && !HostsMatch(absolute.Host, requestHost))
        {
            return true;
        }

        var path = absolute.AbsolutePath;
        if (string.IsNullOrEmpty(path))
        {
            path = "/";
        }

        return InternalPathResolvesToPublishedContent(path);
    }

    private static string GetPrimarySiteHost()
    {
        var siteUrl = SiteDefinition.Current?.SiteUrl;
        if (siteUrl != null && Uri.TryCreate(siteUrl.ToString(), UriKind.Absolute, out var siteUri))
        {
            return siteUri.Host;
        }

        return null;
    }

    private static bool HostsMatch(string a, string b)
    {
        return string.Equals(NormalizeHost(a), NormalizeHost(b), StringComparison.OrdinalIgnoreCase);
    }

    private static string NormalizeHost(string host)
    {
        if (host.StartsWith("www.", StringComparison.OrdinalIgnoreCase))
        {
            return host[4..];
        }

        return host;
    }

    private bool InternalPathResolvesToPublishedContent(string pathForRouting)
    {
        var routeData = _urlResolver.Route(new UrlBuilder(pathForRouting));
        if (routeData == null || ContentReference.IsNullOrEmpty(routeData.ContentLink))
        {
            return false;
        }

        if (!_contentLoader.TryGet<IContent>(routeData.ContentLink, out var content))
        {
            return false;
        }

        return ContentHelper.IsContentPublished(content);
    }
}

If you don’t know how to get that from Optimizely Content Recommendations (formerly Idio) here is a sample.

 public static async Task<IdioResponse> GetRecommendationsAsync(IdioRecommendedContentBlock currentBlock)
 {
     try
     {
         var httpClient = new HttpClient();
         var ivCookie = _cookieService.Service.Get("iv");
         var fetchCount = Math.Max(RecommendedContentIdioFetchSize, currentBlock.NumberOfRecommendations);
         var result = await httpClient.GetAsync(                    
             $"https://api.{_url}/1.0/users/idio_visitor_id:{ivCookie}/content?include_topics&callback=idio.r0&key={currentBlock.DeliveryAPIKey}&session[]={HttpUtility.UrlEncode(_httpContext.Service.HttpContext.Request.PathBase)}%2F&session[]={HttpUtility.UrlEncode(_httpContext.Service.HttpContext.Request.PathBase)}%2F&rpp={fetchCount}");

         if (result.IsSuccessStatusCode)
         {
             string resultStr;
             using (var sr = new StreamReader(result.Content.ReadAsStreamAsync().Result, Encoding.GetEncoding("iso-8859-1")))
             {
                 resultStr = sr.ReadToEnd();
             }

             resultStr = resultStr.Replace("idio.r0(", string.Empty)
             .Replace(", 200)", string.Empty).Replace(", 403)", string.Empty);

             var data = JsonConvert.DeserializeObject<IdioResponse>(resultStr);
             return data;
         }

         return null;
     }
     catch (Exception e)
     {
         _logService.Service.Error(e.Message, e);
         return null;
     }
 }

There were a couple of small details that made this more reliable:

  • We normalized hosts before comparing them, including stripping a leading www..
  • We supported both absolute URLs and relative paths.
  • We treated unpublished content the same as missing content for rendering purposes.

Why this approach scales better than one-off patches

The tempting fix would have been to special-case a few known bad URLs or add more filtering rules in the recommendation layer. That might stop the visible symptom for a week, but it does not solve the architectural mismatch.

What scales is validating every internal recommendation against the system that actually owns publication state.

That gives you a few important benefits:

  • You can survive deleted content, expired content, and unpublished content with the same safeguard.
  • You reduce the chance of sending users into broken journeys from high-visibility homepage or article-page modules.
  • You avoid coupling CMS behavior too tightly to the quirks of a recommendation vendor.

A quiet UX win: fewer empty states

One thing I especially like about this fix is that it solves two user-facing problems at once. It prevents broken links, but it also prevents sparse recommendation modules.

That is the value of overfetching before filtering. Instead of asking for exactly what you plan to show, ask for enough to tolerate validation loss. In recommendation systems, some percentage of results will always be unusable because of content drift, entitlement rules, or missing metadata. Designing for that reality makes the component more resilient.

The broader lesson

If your site combines a CMS with personalization, search, or recommendation services, do not assume those systems share the same definition of “valid content.”

They usually do not.

The safest pattern is simple:

  • Let upstream systems propose content.
  • Let the CMS verify whether that content is still renderable.
  • Only show what passes both tests.

It is a small architectural boundary, but it protects user trust in a big way.

Leave a comment