• Articles
  • Tutorials
  • Interview Questions

Apache Solr Analyzer

Understanding Apache Solr Analyzers

After defining the field type in schema.xml and named the analysis steps that you want to apply to it, you must test it out to confirm that it performed the way you require, to achieve the same you will be provided with the SOLR admin interface. You will have an option to invoke the analyzer for any text field, insert the sample input, and show the resulting token stream.

E.g.  If you wish to add the below field type to intellipaat.xml

<fieldType name="mytermsField" class="solr.TermsField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.HyphenatedWordsFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType> 

The purpose is to reconstruct the hyphenated words. To test this out refer the below fig.
pic2

Simple Post Tool:

There exists a command line tool for POSTing raw XML to a SOLR port. The data in the form of XML will be read from the specified files as command line arguments, as unrefined command line argument strings or through STDIN.

The tool is named as post.jar and can be accessed from ‘exampledocs’ directory: $SOLR/example/exampledocs/post.jar includes a cross-platform Java tool for POST-ing XML documents.
Open a window to turn it and enter as below.

java -jar post.jar <enter message with the list of files>

Uploading Data with Index Handlers:

These are the request handlers created to add, remove and update the documents in the index. Also to get the for importing the rich documents using Tika or from structured data sources using the Data Import Handler, SOLR supports indexing structured documents in JSON, CSV as well in XML.

Commit operation:

  • The operation named <commit> used to write all documents loaded since the last commit to more than one segment files on the disk.
  • The freshly indexed content will not be visible to searches, in prior to the release of commit.
  • Commit operation will open a new searcher and activate any event spectators that have been configured.

Certification in Bigdata Analytics

Optimize operation:

  • This operation helps in requesting SOLR to merge the internal data structure to advance search performances.
  • If there is a huge index, then optimize consumes more time to complete.
  • By combining the many smaller size files into a larger one it is possible to enhance the search performance.
  • If you desire to use SOLR’s replication mechanism to distribute searches on several systems, make sure that after an optimization a complete index required to transfer.

The attributes that accept by Commit and Optimize operations.

Optional  Attributes Description
waitSearcher Default is true. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible.
expungeDeletes Default is false. Merges segments that have more than 10% deleted docs, expunging them in the process.
maxSegments Default is 1. Merges the segments down to no more than this number of segments.

Examples:

<commit waitSearcher="true"/>
<commit waitSearcher="true" expungeDeletes="false"/>
<optimize waitSearcher="true"/>

Course Schedule

Name Date Details
Big Data Course 14 Dec 2024(Sat-Sun) Weekend Batch View Details
21 Dec 2024(Sat-Sun) Weekend Batch
28 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.