Limit on number of SXA Sites : Sitecore Site in IIS keeps restarting due to large number of SXA sites
Let me start this blog with a question - Is there a limit on number of sites Sitecore SXA can support on a single Sitecore instance? If you search an answer for this, you will see the below -
Bear with me, the answer to this question is not a simple 'No' in current state of Sitecore and SXA module. We will discuss more on this in this post.
THE PROBLEM
We have upgraded from Sitecore 8.1 to Sitecore 10.1 and migrated our vanilla Sitecore sites into SXA sites. We started migrating some important sites to SXA and then we planned to move rest of the sites to SXA. Things were fine until the number of SXA sites hosted on our Sitecore instance were small. As we moved our rest of the sites to our new Sitecore instance, we started observing that the Sitecore site in IIS will continuously restart without giving much idea about why it is happening.
THE SOLUTION
We thought this can be a machine specific issue due to memory etc. and may be the issue wont reproduce on other developer machines. But to our surprise, we could reproduce the issue on different machines as well.
Obviously, we opened a Sitecore support ticket to understand this issue and one of our primary questions each time was that is there a limit on number of sites Sitecore or SXA supports. The answer shared with us was obvious - No limits. But we were not convinced because we could see the issue reproducing the moment the number of Sitecore sites reached approx. 540. Yes, that is a large number of sites on a single Sitecore instance and I will explain in upcoming section what caused such large number of sites in our Sitecore instance.
Why such large number of sites in our Sitecore instance?
Ideally the actual number of Sites in our Sitecore instance is lower (lets assume it is close to 100 which is still a high number). But in SXA, a site is equivalent to site definition defined in Site Groupings folder within the Site>Settings.
If a site has n number of languages, we had to create n number of site definitions for the same site to support multiple language. In such case, the only difference between these site definitions is that they have different values in Language field. Even though its single site, it appears as multiple sites due to large number of site definitions it has now to support multiple languages,.
Now, we wanted to incorporate preview functionality for our content authors. Preview functionality will basically show the site preview before the site actually gets published with actual page URLs that appear on live site (I can talk about this separately in a different blog). Such URLs are user friendly and easier to share with business teams and content approval teams (Sitecore's Preview feature doesn't generate user friendly URLs. Hence, we couldn't use it).
To achieve this, we created a preview version of each site definition. Preview version of site definition will be different from main site definition as it will point to master database instead of web database (which is used for published sites or live sites only). This means if a site has n languages, it has 2n site definitions now (n site definition to support n languages and then each site definition shall also have a preview version).
For e.g. Lets assume we have 100 sites and at average each site supports 2 languages, we will have 2x100 = 200 site definitions. Now each of these site definitions shall have preview version also which means each site definition will have duplicate now which will use master database to support preview functionality. Hence, number of total site definitions = 2x200 = 400.
I hope the above example helps to explain how 100 sites in Sitecore are getting converted in to 400 site definitions in Sitecore SXA.
So far so good! But our actual count of Sites was higher causing total site definitions to cross 800. Our application worked well until our total number of site definitions didn't cross 540. Once it crossed this limit, the application pool will recycle on its own and the site wont ever come up.
We checked our memories and CPU utilizations and nothing appeared alarming. Our memory + CPU usage was always way below our server's capability.
We started looking in to Event Viewer, we analyzed the memory dumps and logs. We found app crash incidences due to Sitecore waiting to acquire lock on certain resources. Since the Sitecore instance couldn't get resources in time, it used to error out. Error stacks looked as below -
ntdll!NtWaitForMultipleObjects+14
KERNELBASE!WaitForMultipleObjectsEx+fe
clr!WaitForMultipleObjectsEx_SO_TOLERANT+62
clr!Thread::DoAppropriateWaitWorker+205
clr!Thread::DoAppropriateWait+7d
clr!CLREventBase::WaitEx+b6
clr!Thread::Block+2c
clr!SyncBlock::Wait+1c8
clr!ObjectNative::WaitTimeout+e1
Sitecore.Threading.Semaphore.P()+61
Sitecore.Threading.CustomThreadPool.ProcessQueuedItems()+c6
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+15
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+55
mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+55
clr!CallDescrWorkerInternal+83
clr!CallDescrWorkerWithHandler+4e
clr!MethodDescCallSite::CallTargetWorker+102
clr!ThreadNative::KickOffThread_Worker+fffff02f
clr!ManagedThreadBase_DispatchInner+40
clr!ManagedThreadBase_DispatchMiddle+6c
clr!ManagedThreadBase_DispatchOuter+4c
clr!ManagedThreadBase_DispatchInCorrectAD+15
clr!Thread::DoADCallBack+26b
clr!ManagedThreadBase_DispatchInner+2e2f
clr!ManagedThreadBase_DispatchMiddle+6c
clr!ManagedThreadBase_DispatchOuter+4c
clr!ManagedThreadBase_FullTransitionWithAD+2f
clr!ThreadNative::KickOffThread+e6
clr!Thread::intermediateThreadProc+8b
kernel32!BaseThreadInitThunk+14
ntdll!RtlUserThreadStart+21
We tried to collect the memory dumps of our application when this happens.
According to the memory dump, the StackOverflowException was caused by a bug #500740: StackOverflowException occurs when using OWIN and the number of sites is over several hundred. The issue occurs because Owin "MapMiddleware" is executed recursively for every site instance. Owin middleware recursion level increases with the number of Sitecore Accelerator sites and may exceed thread stack size. Hence, causing a StackOverflow.
In order to workaround this issue, Sitecore support suggested us to disable this LogoutEndpoint processor if one does not use the following SSO functionality: https://doc.sitecore.com/xp/en/developers/101/sitecore-experience-manager/single-sign-out.html
Alternatively, if this approach isn't suitable for people who use SSO, one should create a custom LogoutEndpoint processor with the changes shown below-
In the sample, we've identified the "Old Code" and "New Code" that resolves the issue as well. To apply the code change please build an assembly with the sample class and override "Sitecore.Owin.Authentication.IdentityServer.Pipelines.Initialize.LogoutEndpoint" processor with new "Sitecore.CS0294067.Owin.Authentication.IdentityServer.Pipelines.Initialize" in the "\App_Config\Sitecore\Owin.Authentication.IdentityServer\Sitecore.Owin.Authentication.IdentityServer.config".
THE VERDICT
In current state, there is known issue in SXA which limits Sitecore capability to host number of SXA sites to couple of hundreds. So, senior developers and architects should consider this fact while planning to add new sites to their existing Sitecore instances.
Hope it helps you guys!
Comments
Post a Comment