Skip to main content

Solr Search for Sitecore - Notes --Draft

SOLR SEARCH QUICK NOTES

http://www.solrtutorial.com/solr-query-syntax.html


BASIC CONCEPTS

Solr prepares indexes. Indexes contains documents and documents contains fields.

Documents can be assumed as rows in a table and fields as columns in tables.


Before adding documents to index, we need to specify the schema. (not advisable to change schema after documents have been added). 

The schema declares -

  • what kinds of fields there are
  • which field should be used as primary key
  • which fields are required
  • how to index and search each field


When we search for a word, the word undergoes transformation to get converted to a token. SOLR then looks for these tokens in its indexes.

Each token contains references to pages where it was found when SOLR was crawling and this is how search results are returned.

All fields are not searchable. Many of them are maintained in indexes so that when a match is found, the field values may need to be displayed in the search results. This is done by following setting 

stored = true and indexed = false


We don't store all fields in the index because it increases the index size. 

The larger the index size the slower the search.

SOLR is powered by Lucene. SOLR is like car and Lucene is like engine.


SOLR QUERY SYNTAX

Lucene Query Syntax - for querying its indexes

Standard Lucene query parser used by default (DisMax or eDisMax can be used as well depending on use case)

Some query examples - http://www.solrtutorial.com/solr-query-syntax.html

SOLR CONFIGURATIONS

2 most important files in SOLR configurations - 

  • schema.xml
  • solrconfig.xml


SCHEMA.XML

First file to configure while setting up SOLR. It contains -

  • Field Types
  • Fields 
  • Misc info


Field Types - predefined field types. We can create new field types as well

e.g. text

<fieldType name="int" class="solr.TrieintField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>


Fields

<field name = "id" type="string" indexed="true" stored="true" required="true"/>


SOLRCONFIG.XML

Second file you configure when setting up SOLR

Data directory location

cache parameters - to cache documents etc.

2 implementations of cache available for Solr

LRUCache, based on a synchronized LinkedHashMap, and

         FastLRUCache, based on a ConcurrentHashMap.  FastLRUCache has faster gets

         and slower puts in single threaded operation and thus is generally faster

         than LRUCache when the hit ratio of the cache is high (> 75%), and may be

         faster under other scenarios on multi-cpu systems. -->

    <!-- Cache used by SolrIndexSearcher for filters (DocSets),

         unordered sets of *all* documents that match a query.

         When a new searcher is opened, its caches may be prepopulated

         or "autowarmed" using data from caches in the old searcher.

         autowarmCount is the number of items to prepopulate.  For LRUCache,

         the autowarmed items will be the most recently accessed items.

       Parameters:

         class - the SolrCache implementation LRUCache or FastLRUCache

         size - the maximum number of entries in the cache

         initialSize - the initial capacity (number of entries) of

           the cache.  (seel java.util.HashMap)

         autowarmCount - the number of entries to prepopulate from

           and old cache.

         -->

    <filterCache

      class="solr.FastLRUCache"

      size="512"

      initialSize="512"

      autowarmCount="0"/>


    <!-- Cache used to hold field values that are quickly accessible

         by document id.  The fieldValueCache is created by default

         even if not configured here.

      <fieldValueCache

        class="solr.FastLRUCache"

        size="512"

        autowarmCount="128"

        showItems="32"

      />

    -->


   <!-- queryResultCache caches results of searches - ordered lists of

         document ids (DocList) based on a query, a sort, and the range

         of documents requested.  -->

    <queryResultCache

      class="solr.LRUCache"

      size="512"

      initialSize="512"

      autowarmCount="0"/>


  <!-- documentCache caches Lucene Document objects (the stored fields for each document).

       Since Lucene internal document ids are transient, this cache will not be autowarmed.  -->

    <documentCache

      class="solr.LRUCache"

      size="512"

      initialSize="512"

      autowarmCount="0"/>

  

  

  

request handlers - handler responsible for accepting http request, perform search and return the result. The default request handler known as standard requst handler looks like - 

<requestHandler name="standard" class="solr.SearchHandler" default="true">

   <lst name="defaults">

     <str name="echoParams">explicit</str>

     <!--

     <int name="rows">10</int>

     <str name="fl">*</str>

     <str name="version">2.1</str>

      -->

   </lst>

</requestHandler>


Every request handler has a configurable list of search components to perform actual search.

<arr name="components">

  <str>query</str>

  <str>facet</str>

  <str>mlt</str>

  <str>highlight</str>

  <str>stats</str>

  <str>debug</str>

</arr>

</arr>




Search components

 - actually perform search. By default following component are available - 

<searchComponent name="query"     class="org.apache.solr.handler.component.QueryComponent" />

<searchComponent name="facet"     class="org.apache.solr.handler.component.FacetComponent" />

<searchComponent name="mlt"       class="org.apache.solr.handler.component.MoreLikeThisComponent" />

<searchComponent name="highlight" class="org.apache.solr.handler.component.HighlightComponent" />

<searchComponent name="stats"     class="org.apache.solr.handler.component.StatsComponent" />

<searchComponent name="debug"     class="org.apache.solr.handler.component.DebugComponent" />


===================================

SOLR IN SITECORE


Solr needs a defined XML schema when working with documents.


You can modify an existing schema with the Generate the Solr Schema.xml file tool. This tool automatically generates a basic schema and ensures all the fields that Sitecore needs are present. You can add your own fields to this schema, as long as you do not change the system index fields.

If you have any other field definitions, copy fields, or dynamic fields configured in your schema, they are overwritten by the schema generator. To preserve these fields, copy your original schema and merge it with the newly generated schema afterwards.


The Sitecore Solr provider uses an IOC (Inversion of Control) container so that all the elements inside it can be swapped without re-compilation. The default Sitecore installation includes a default implementation of the Solr connector (SolrNet.dll).


Your website Include folder contains several configuration files. Lucene search is enabled by default. If you want to use Solr, you normally disable the Lucene search config files and enable the Solr config file (although it is technically possible to use Lucene for some indexes and Solr for other indexes). This enables Solr integration and gives you access to all the Solr specific configuration settings.




The following Solr specific settings can be found in the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file.


Specifying a SOLR Service Address

This setting tells Sitecore where the SOLR server is located. Sitecore appends the core name so only the base address needs to be supplied.


Enabling a Search Provider

This setting tells Sitecore that SOLR is enabled and so attempts to connect to the SOLR server the next time the index is accessed. If it cannot connect you get an error. To disable, set this back to Lucene, which was the default setting.

<setting name="ContentSearch.Provider" value="Solr" /> (Default: “Lucene”)


Maximum Number of Search Results

This is a global setting found in the Sitecore.ContentSearch.config file.

This setting contains the maximum number of documents to retrieve on a single request if a limit has not been specified in the query, for example, Take(10). It is important to remember, for performance reasons, when querying how many results will be returned from the query being run and to handle them correctly, for example by using paging.

<setting name="ContentSearch.SearchMaxResults" value="500" />


Enabling Batch Mode

When an item is indexed the composed document is saved to the search index. When the default Lucene provider is enabled then each write is being flushed to a file on the local disk. When a document is written using the SOLR provider the update has to travel over a network.


When an index is rebuilt a large number of document updates are created, this could result in a lot of network traffic which is not very efficient. Therefore using batch can help to optimize the update process as your indexes grow in size.

<setting name="ContentSearch.Update.BatchModeEnabled" value="true" />

<setting name="ContentSearch.Update.BatchSize" value="500" />

Batch mode (enabled by default) takes these document updates and only flushes to the Solr server when the batch has reached a certain size.


Walkthrough_setting_up_solr-Picture_27-rId19-2129959417.png

As your index grows you may want to increase this batch size to gain the most out of this process.


You must update the global.asax file so that the Solr provider is loaded when the application start. Do this by specifying that your application inherits from one of the application classes provided; the specific configuration is dependent on your choice of IOC container.


For example, to update the global.asax file to use Castle Windsor:

In your website root folder, locate the global.asax file: wwwroot\<sitename>\Website

Open global.asax and in the first line, replace:

Inherits="Sitecore.Web.Application"

With

Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication"

This registers the IOC (inversion of control) components for Castle Windsor enabling Solr integration to work correctly.


Create a default and a persistent search query====


Comments

POPULAR POSTS

Sitecore PowerShell Script to create all language versions for an item from en version

  We have lots of media items and our business wants to copy the data from en version of media item to all other language versions defined in System/Languages. This ensures that media is available in all the languages. So, we created the below powershell script to achieve the same -  #Get all language versions defined in System/Languages $languages = Get-ChildItem /sitecore/System/Languages -recurse | Select $_.name | Where-Object {$_.name -ne "en"} | Select Name #Ensuring correct items are updated by comparing the template ID  $items = Get-ChildItem -Path "/sitecore/media library/MyProjects" -Recurse | Where-Object {'<media item template id>' -contains $_.TemplateID} #Bulk update context to improve performance New-UsingBlock (New-Object Sitecore.Data.BulkUpdateContext) { foreach($item in $items){    foreach($language in $languages){ $languageVersion = Get-Item -Path $item.Paths.Path -Language $language.Name #Check if language versi...

Export Sitecore media library files to zip using SPE

If you ever require to export Sitecore media files to zip (may be to optimize them), SPE (Sitecore Powershell Extension) has probably the easiest way to do this for you. It's as easy as the below 3 steps -  1. Right click on your folder (icons folder in snap)>Click on Scripts> Click on Download 2. SPE will start zipping all the media files placed within this folder. 3. Once zipping is done, you will see the Download option in the next screen. Click Download Zip containing the media files within is available on your local machine. You can play around with the images now. Hope this helps!! Like and Share ;)

Make Sitecore instance faster using Roslyn Compiler

When we install the Sitecore instance on local, the first load is slow. After each code deploy also, it takes a while for the Sitecore instance to load and experience editor to come up. For us, the load time for Sitecore instance on local machines was around 4 minutes. We started looking for ways to minimize it and found that if we update our Web.config to use Roslyn compiler and include the relevant Nugets into the project, our load times will improve. We followed the simple steps - Go to the Project you wish to add the NuGet package and right click the project and click 'Manage NuGet Packages'. Make sure your 'Package Source' is set to nuget.org and go to the 'Browse' Tab and search Microsoft.CodeDom.Providers.DotNetCompilerPlatform. Install whichever version you desire, make sure you note which version you installed. You can learn more about it  here . After installation, deploy your project, make sure the Microsoft.CodeDom.Providers.DotNetCompilerPlatform.d...

Experience of a first time Sitecore MVP

The Journey I have been working in Sitecore for almost 10 years now. When I was a beginner in Sitecore, I was highly impressed by the incredible community support. In fact, my initial Sitecore learning path was entirely based on community written blogs on Sitecore. During a discussion with my then technology lead Neeraj Gulia , he proposed the idea that I should start giving back to developer community whenever I get chance. Just like I have been helped by many developers via online blogs, stackoverflow etc., I should also try to help others. Fast forward a few years and I met  Nehemiah Jeyakumar  (now an MVP). He had a big archive of his technical notes in the form Sitecore blogs. I realized my first blog dont have to be perfect and it can be as simple as notes to a specific problem for reference in future. That's when I probably created my first blog post on Sitecore. At that time, I didn't knew about the Sitecore MVP program. Over the years, I gained more confidence to writ...

Clean Coding Principles in CSharp

A code shall be easy to read and understand. In this post, I am outlining basic principles  about clean coding after researching through expert recommended books, trainings and based on my experience. A common example to start with is a variable declaration like - int i  The above statement did not clarify the purpose of variable i. However,  the same variable can be declared as -  int pageNumber The moment we declared the variable as int pageNumber, our brain realized that the variable is going to store the value for number of pages. We have set the context in our brain now and it is ready to understand what the code is going to do next with these page numbers. This is one of the basic advantages of clean coding. Reasons for clean coding -  • Reading clean code is easier - Every code is revisited after certain amount of time either by the same or different developer who created it. In both the cases, if the code is unclean, its difficult to understand and u...