SOLR SEARCH QUICK NOTES
http://www.solrtutorial.com/solr-query-syntax.html
BASIC CONCEPTS
Solr prepares indexes. Indexes contains documents and documents contains fields.
Documents can be assumed as rows in a table and fields as columns in tables.
Before adding documents to index, we need to specify the schema. (not advisable to change schema after documents have been added).
The schema declares -
- what kinds of fields there are
- which field should be used as primary key
- which fields are required
- how to index and search each field
When we search for a word, the word undergoes transformation to get converted to a token. SOLR then looks for these tokens in its indexes.
Each token contains references to pages where it was found when SOLR was crawling and this is how search results are returned.
All fields are not searchable. Many of them are maintained in indexes so that when a match is found, the field values may need to be displayed in the search results. This is done by following setting
stored = true and indexed = false
We don't store all fields in the index because it increases the index size.
The larger the index size the slower the search.
SOLR is powered by Lucene. SOLR is like car and Lucene is like engine.
SOLR QUERY SYNTAX
Lucene Query Syntax - for querying its indexes
Standard Lucene query parser used by default (DisMax or eDisMax can be used as well depending on use case)
Some query examples - http://www.solrtutorial.com/solr-query-syntax.html
SOLR CONFIGURATIONS
2 most important files in SOLR configurations -
- schema.xml
- solrconfig.xml
SCHEMA.XML
First file to configure while setting up SOLR. It contains -
- Field Types
- Fields
- Misc info
Field Types - predefined field types. We can create new field types as well
e.g. text
<fieldType name="int" class="solr.TrieintField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
Fields
<field name = "id" type="string" indexed="true" stored="true" required="true"/>
SOLRCONFIG.XML
Second file you configure when setting up SOLR
Data directory location
cache parameters - to cache documents etc.
2 implementations of cache available for Solr
LRUCache, based on a synchronized LinkedHashMap, and
FastLRUCache, based on a ConcurrentHashMap. FastLRUCache has faster gets
and slower puts in single threaded operation and thus is generally faster
than LRUCache when the hit ratio of the cache is high (> 75%), and may be
faster under other scenarios on multi-cpu systems. -->
<!-- Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query.
When a new searcher is opened, its caches may be prepopulated
or "autowarmed" using data from caches in the old searcher.
autowarmCount is the number of items to prepopulate. For LRUCache,
the autowarmed items will be the most recently accessed items.
Parameters:
class - the SolrCache implementation LRUCache or FastLRUCache
size - the maximum number of entries in the cache
initialSize - the initial capacity (number of entries) of
the cache. (seel java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
and old cache.
-->
<filterCache
class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<!-- Cache used to hold field values that are quickly accessible
by document id. The fieldValueCache is created by default
even if not configured here.
<fieldValueCache
class="solr.FastLRUCache"
size="512"
autowarmCount="128"
showItems="32"
/>
-->
<!-- queryResultCache caches results of searches - ordered lists of
document ids (DocList) based on a query, a sort, and the range
of documents requested. -->
<queryResultCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<!-- documentCache caches Lucene Document objects (the stored fields for each document).
Since Lucene internal document ids are transient, this cache will not be autowarmed. -->
<documentCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
request handlers - handler responsible for accepting http request, perform search and return the result. The default request handler known as standard requst handler looks like -
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!--
<int name="rows">10</int>
<str name="fl">*</str>
<str name="version">2.1</str>
-->
</lst>
</requestHandler>
Every request handler has a configurable list of search components to perform actual search.
<arr name="components">
<str>query</str>
<str>facet</str>
<str>mlt</str>
<str>highlight</str>
<str>stats</str>
<str>debug</str>
</arr>
</arr>
Search components
- actually perform search. By default following component are available -
<searchComponent name="query" class="org.apache.solr.handler.component.QueryComponent" />
<searchComponent name="facet" class="org.apache.solr.handler.component.FacetComponent" />
<searchComponent name="mlt" class="org.apache.solr.handler.component.MoreLikeThisComponent" />
<searchComponent name="highlight" class="org.apache.solr.handler.component.HighlightComponent" />
<searchComponent name="stats" class="org.apache.solr.handler.component.StatsComponent" />
<searchComponent name="debug" class="org.apache.solr.handler.component.DebugComponent" />
===================================
SOLR IN SITECORE
Solr needs a defined XML schema when working with documents.
You can modify an existing schema with the Generate the Solr Schema.xml file tool. This tool automatically generates a basic schema and ensures all the fields that Sitecore needs are present. You can add your own fields to this schema, as long as you do not change the system index fields.
If you have any other field definitions, copy fields, or dynamic fields configured in your schema, they are overwritten by the schema generator. To preserve these fields, copy your original schema and merge it with the newly generated schema afterwards.
The Sitecore Solr provider uses an IOC (Inversion of Control) container so that all the elements inside it can be swapped without re-compilation. The default Sitecore installation includes a default implementation of the Solr connector (SolrNet.dll).
Your website Include folder contains several configuration files. Lucene search is enabled by default. If you want to use Solr, you normally disable the Lucene search config files and enable the Solr config file (although it is technically possible to use Lucene for some indexes and Solr for other indexes). This enables Solr integration and gives you access to all the Solr specific configuration settings.
The following Solr specific settings can be found in the Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config file.
Specifying a SOLR Service Address
This setting tells Sitecore where the SOLR server is located. Sitecore appends the core name so only the base address needs to be supplied.
Enabling a Search Provider
This setting tells Sitecore that SOLR is enabled and so attempts to connect to the SOLR server the next time the index is accessed. If it cannot connect you get an error. To disable, set this back to Lucene, which was the default setting.
<setting name="ContentSearch.Provider" value="Solr" /> (Default: “Lucene”)
Maximum Number of Search Results
This is a global setting found in the Sitecore.ContentSearch.config file.
This setting contains the maximum number of documents to retrieve on a single request if a limit has not been specified in the query, for example, Take(10). It is important to remember, for performance reasons, when querying how many results will be returned from the query being run and to handle them correctly, for example by using paging.
<setting name="ContentSearch.SearchMaxResults" value="500" />
Enabling Batch Mode
When an item is indexed the composed document is saved to the search index. When the default Lucene provider is enabled then each write is being flushed to a file on the local disk. When a document is written using the SOLR provider the update has to travel over a network.
When an index is rebuilt a large number of document updates are created, this could result in a lot of network traffic which is not very efficient. Therefore using batch can help to optimize the update process as your indexes grow in size.
<setting name="ContentSearch.Update.BatchModeEnabled" value="true" />
<setting name="ContentSearch.Update.BatchSize" value="500" />
Batch mode (enabled by default) takes these document updates and only flushes to the Solr server when the batch has reached a certain size.
Walkthrough_setting_up_solr-Picture_27-rId19-2129959417.png
As your index grows you may want to increase this batch size to gain the most out of this process.
You must update the global.asax file so that the Solr provider is loaded when the application start. Do this by specifying that your application inherits from one of the application classes provided; the specific configuration is dependent on your choice of IOC container.
For example, to update the global.asax file to use Castle Windsor:
In your website root folder, locate the global.asax file: wwwroot\<sitename>\Website
Open global.asax and in the first line, replace:
Inherits="Sitecore.Web.Application"
With
Inherits="Sitecore.ContentSearch.SolrProvider.CastleWindsorIntegration.WindsorApplication"
This registers the IOC (inversion of control) components for Castle Windsor enabling Solr integration to work correctly.
Create a default and a persistent search query====
Comments
Post a Comment