Skip to main content

When Sitecore 10.4.1 Sneaked Into a Sitecore AKS Cluster and broke it: A Troubleshooting Tale

Sitecore 10.4.1 is finally here! Everyone's excited about the new release—but with every update, sometimes surprises sneak in. And we’ve got one such story for you.

This happened on a Sitecore 10.4.0 instance running in an Azure Kubernetes Service (AKS) cluster. One day, a fellow Sitecorian reached out to me and told that things just… broke.

The Problem

He said it was supposed to be a normal release. A developer pushed changes, the pipeline ran, and then—boom—errors everywhere. So we started looking into it. For your context, we got the below error -



The Analysis

Our First Guess: Maybe It’s the Code?

We thought maybe the issue was in the new Pull Request (PR). We checked the changes, but everything looked okay.

Just to be sure, we reverted the PR and tried the release again. Same error.

Then we tried releasing an older build (before the error started). That one worked just fine.

So now we knew—this wasn’t caused by code changes.

The Solution

Soon, we felt like something has changed behind the scenes. I remembered seeing the announcement that Sitecore 10.4.1 was getting released (my LinkedIn feed was already full of the buzz).

Based on my past experience in CICD and familiarity with Sitecore in AKS, I am aware that build release process to Sitecore in AKS is different from releases in on-premise and PaaS. The key difference is the build for Sitecore AKS is pushed to a Container Registry and build artifacts for custom solution are layered over the base Sitecore image (not usual in on-premise and PaaS). 

During build stage in CICD, the base Sitecore image is pulled each time from Sitecore's public container registries and it has the possibility to change. So I had a strong feeling that the issue was related to the Sitecore image.

We reached out to Sitecore support and explained them everything. 

They replied quickly—almost like they were already expecting this problem.

And guess what? They confirmed the issue: our build had started using the new 10.4.1 image, not 10.4.0 like we thought. The reason for the issue was found to be the incorrect Sitecore image usage. 

How it happened?

So, if you are new to Sitecore in Kubernetes and Docker, you will find it useful to know that the base Sitecore image versions are specified  in env.template and kustomization files. 

In our Kubernetes config files (env.template and kustomization.yaml), we were using this image tag: 10.4-ltsc2022

This tag means: “Give me the latest image in the 10.4 series.”

So when 10.4.1 was released, the build automatically started using the latest version on 10.4 series, even though we were expecting 10.4.0.

You can check the image reference page, to see the latest images released: https://raw.githubusercontent.com/Sitecore/docker-images/refs/heads/master/tags/sitecore-tags.md

To fix the issue, we changed the tag to: 10.4.0-ltsc2022

This locked the version to exactly what we wanted—and the error was gone. Happy ending!! Not yet.

Important Tip for Sitecore in AKS/Docker

Most people prefer to reuse the files from Sitecore MVP-site repo. Below snapshot shows how Sitecore image version is referenced in these file - 



In the snapshot, you can see that the default env.template file (and also the k8s files) come with two-digit image format, e.g. 10.4-ltsc2022. This needs to change to 10.4.x-ltsc2022

We passed this information to Sitecore Support to notify existing users to make the change as they may also get impacted by this.

Need more details? See the following reference page:

https://doc.sitecore.com/xp/en/developers/latest/developer-tools/sitecore-image-reference.html#platform-images

Final Thoughts

This issue took us by surprise—but it taught us something valuable:

Always pin your Sitecore image versions in Kubernetes.
Don’t assume it’ll stay the same forever.

We also told Sitecore Support so they could help others who might face the same problem.

Hope this helps you avoid the same headache!

Until next time—happy Sitecoring! 👋

Comments

POPULAR POSTS

Sitecore PowerShell Script to create all language versions for an item from en version

  We have lots of media items and our business wants to copy the data from en version of media item to all other language versions defined in System/Languages. This ensures that media is available in all the languages. So, we created the below powershell script to achieve the same -  #Get all language versions defined in System/Languages $languages = Get-ChildItem /sitecore/System/Languages -recurse | Select $_.name | Where-Object {$_.name -ne "en"} | Select Name #Ensuring correct items are updated by comparing the template ID  $items = Get-ChildItem -Path "/sitecore/media library/MyProjects" -Recurse | Where-Object {'<media item template id>' -contains $_.TemplateID} #Bulk update context to improve performance New-UsingBlock (New-Object Sitecore.Data.BulkUpdateContext) { foreach($item in $items){    foreach($language in $languages){ $languageVersion = Get-Item -Path $item.Paths.Path -Language $language.Name #Check if language versi...

Export Sitecore media library files to zip using SPE

If you ever require to export Sitecore media files to zip (may be to optimize them), SPE (Sitecore Powershell Extension) has probably the easiest way to do this for you. It's as easy as the below 3 steps -  1. Right click on your folder (icons folder in snap)>Click on Scripts> Click on Download 2. SPE will start zipping all the media files placed within this folder. 3. Once zipping is done, you will see the Download option in the next screen. Click Download Zip containing the media files within is available on your local machine. You can play around with the images now. Hope this helps!! Like and Share ;)

Make Sitecore instance faster using Roslyn Compiler

When we install the Sitecore instance on local, the first load is slow. After each code deploy also, it takes a while for the Sitecore instance to load and experience editor to come up. For us, the load time for Sitecore instance on local machines was around 4 minutes. We started looking for ways to minimize it and found that if we update our Web.config to use Roslyn compiler and include the relevant Nugets into the project, our load times will improve. We followed the simple steps - Go to the Project you wish to add the NuGet package and right click the project and click 'Manage NuGet Packages'. Make sure your 'Package Source' is set to nuget.org and go to the 'Browse' Tab and search Microsoft.CodeDom.Providers.DotNetCompilerPlatform. Install whichever version you desire, make sure you note which version you installed. You can learn more about it  here . After installation, deploy your project, make sure the Microsoft.CodeDom.Providers.DotNetCompilerPlatform.d...

Experience of a first time Sitecore MVP

The Journey I have been working in Sitecore for almost 10 years now. When I was a beginner in Sitecore, I was highly impressed by the incredible community support. In fact, my initial Sitecore learning path was entirely based on community written blogs on Sitecore. During a discussion with my then technology lead Neeraj Gulia , he proposed the idea that I should start giving back to developer community whenever I get chance. Just like I have been helped by many developers via online blogs, stackoverflow etc., I should also try to help others. Fast forward a few years and I met  Nehemiah Jeyakumar  (now an MVP). He had a big archive of his technical notes in the form Sitecore blogs. I realized my first blog dont have to be perfect and it can be as simple as notes to a specific problem for reference in future. That's when I probably created my first blog post on Sitecore. At that time, I didn't knew about the Sitecore MVP program. Over the years, I gained more confidence to writ...

Clean Coding Principles in CSharp

A code shall be easy to read and understand. In this post, I am outlining basic principles  about clean coding after researching through expert recommended books, trainings and based on my experience. A common example to start with is a variable declaration like - int i  The above statement did not clarify the purpose of variable i. However,  the same variable can be declared as -  int pageNumber The moment we declared the variable as int pageNumber, our brain realized that the variable is going to store the value for number of pages. We have set the context in our brain now and it is ready to understand what the code is going to do next with these page numbers. This is one of the basic advantages of clean coding. Reasons for clean coding -  • Reading clean code is easier - Every code is revisited after certain amount of time either by the same or different developer who created it. In both the cases, if the code is unclean, its difficult to understand and u...