Solr Search for Sitecore

Solr Search for Sitecore - Notes --Draft

SOLR SEARCH QUICK NOTES

http://www.solrtutorial.com/solr-query-syntax.html

BASIC CONCEPTS

Solr prepares indexes. Indexes contains documents and documents contains fields.

Documents can be assumed as rows in a table and fields as columns in tables.

Before adding documents to index, we need to specify the schema. (not advisable to change schema after documents have been added).

The schema declares -

what kinds of fields there are
which field should be used as primary key
which fields are required
how to index and search each field

When we search for a word, the word undergoes transformation to get converted to a token. SOLR then looks for these tokens in its indexes.

Each token contains references to pages where it was found when SOLR was crawling and this is how search results are returned.

All fields are not searchable. Many of them are maintained in indexes so that when a match is found, the field values may need to be displayed in the search results. This is done by following setting

stored = true and indexed = false

We don't store all fields in the index because it increases the index size.

The larger the index size the slower the search.

SOLR is powered by Lucene. SOLR is like car and Lucene is like engine.

SOLR QUERY SYNTAX

Lucene Query Syntax - for querying its indexes

Standard Lucene query parser used by default (DisMax or eDisMax can be used as well depending on use case)

Some query examples - http://www.solrtutorial.com/solr-query-syntax.html

SOLR CONFIGURATIONS

2 most important files in SOLR configurations -

schema.xml
solrconfig.xml

SCHEMA.XML

First file to configure while setting up SOLR. It contains -

Field Types
Fields
Misc info

Field Types - predefined field types. We can create new field types as well

e.g. text

Fields

SOLRCONFIG.XML

Second file you configure when setting up SOLR

Data directory location

cache parameters - to cache documents etc.

2 implementations of cache available for Solr

LRUCache, based on a synchronized LinkedHashMap, and

FastLRUCache, based on a ConcurrentHashMap. FastLRUCache has faster gets

and slower puts in single threaded operation and thus is generally faster

than LRUCache when the hit ratio of the cache is high (> 75%), and may be

faster under other scenarios on multi-cpu systems. -->

<!-- Cache used by SolrIndexSearcher for filters (DocSets),

unordered sets of *all* documents that match a query.

When a new searcher is opened, its caches may be prepopulated

or "autowarmed" using data from caches in the old searcher.

autowarmCount is the number of items to prepopulate. For LRUCache,

the autowarmed items will be the most recently accessed items.

Parameters:

class - the SolrCache implementation LRUCache or FastLRUCache

size - the maximum number of entries in the cache

initialSize - the initial capacity (number of entries) of

the cache. (seel java.util.HashMap)

autowarmCount - the number of entries to prepopulate from

and old cache.

-->

<filterCache

class="solr.FastLRUCache"

size="512"

initialSize="512"

autowarmCount="0"/>

<!-- Cache used to hold field values that are quickly accessible

by document id. The fieldValueCache is created by default

even if not configured here.

<fieldValueCache

class="solr.FastLRUCache"

size="512"

autowarmCount="128"

showItems="32"

-->

<!-- queryResultCache caches results of searches - ordered lists of

document ids (DocList) based on a query, a sort, and the range

of documents requested. -->

<queryResultCache

class="solr.LRUCache"

size="512"

initialSize="512"

autowarmCount="0"/>

<!-- documentCache caches Lucene Document objects (the stored fields for each document).

Since Lucene internal document ids are transient, this cache will not be autowarmed. -->

<documentCache

class="solr.LRUCache"

size="512"

initialSize="512"

autowarmCount="0"/>

request handlers - handler responsible for accepting http request, perform search and return the result. The default request handler known as standard requst handler looks like -

<str name="echoParams">explicit</str>

<!--

-->

</lst>

</requestHandler>

Every request handler has a configurable list of search components to perform actual search.

<str>query</str>

<str>facet</str>

<str>highlight</str>

<str>stats</str>

<str>debug</str>

</arr>

Search components

- actually perform search. By default following component are available -

===================================

SOLR IN SITECORE

Solr needs a defined XML schema when working with documents.

You can modify an existing schema with the Generate the Solr Schema.xml file tool. This tool automatically generates a basic schema and ensures all the fields that Sitecore needs are present. You can add your own fields to this schema, as long as you do not change the system index fields.

If you have any other field definitions, copy fields, or dynamic fields configured in your schema, they are overwritten by the schema generator. To preserve these fields, copy your original schema and merge it with the newly generated schema afterwards.

The Sitecore Solr provider uses an IOC (Inversion of Control) container so that all the elements inside it can be swapped without re-compilation. The default Sitecore installation includes a default implementation of the Solr connector (SolrNet.dll).

Your website Include folder contains several configuration files. Lucene search is enabled by default. If you want to use Solr, you normally disable the Lucene search config files and enable the Solr config file (although it is technically possible to use Lucene for some indexes and Solr for other indexes). This enables Solr integration and gives you access to all the Solr specific configuration settings.

The following Solr specific settings can be found in the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file.

Specifying a SOLR Service Address

This setting tells Sitecore where the SOLR server is located. Sitecore appends the core name so only the base address needs to be supplied.

Enabling a Search Provider

This setting tells Sitecore that SOLR is enabled and so attempts to connect to the SOLR server the next time the index is accessed. If it cannot connect you get an error. To disable, set this back to Lucene, which was the default setting.

<setting name="ContentSearch.Provider" value="Solr" /> (Default: “Lucene”)

Maximum Number of Search Results

This is a global setting found in the Sitecore.ContentSearch.config file.

This setting contains the maximum number of documents to retrieve on a single request if a limit has not been specified in the query, for example, Take(10). It is important to remember, for performance reasons, when querying how many results will be returned from the query being run and to handle them correctly, for example by using paging.

Enabling Batch Mode

When an item is indexed the composed document is saved to the search index. When the default Lucene provider is enabled then each write is being flushed to a file on the local disk. When a document is written using the SOLR provider the update has to travel over a network.

When an index is rebuilt a large number of document updates are created, this could result in a lot of network traffic which is not very efficient. Therefore using batch can help to optimize the update process as your indexes grow in size.

Batch mode (enabled by default) takes these document updates and only flushes to the Solr server when the batch has reached a certain size.

Walkthrough_setting_up_solr-Picture_27-rId19-2129959417.png

As your index grows you may want to increase this batch size to gain the most out of this process.

You must update the global.asax file so that the Solr provider is loaded when the application start. Do this by specifying that your application inherits from one of the application classes provided; the specific configuration is dependent on your choice of IOC container.

For example, to update the global.asax file to use Castle Windsor:

In your website root folder, locate the global.asax file: wwwroot\<sitename>\Website

Open global.asax and in the first line, replace:

Inherits="Sitecore.Web.Application"

With

Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication"

This registers the IOC (inversion of control) components for Castle Windsor enabling Solr integration to work correctly.

Create a default and a persistent search query====

Sitecore Docs - Tech Tips by GS

Search This Blog