Understanding Solr Architecture
High availability and fault tolerance are combined with the Solr server, we called it a SolrCloud. It provides distributed indexing and searching capabilities.
The most important features of Solr cloud,
- Central configuration for every cluster
- Automatic load balancing and failover for queries
- ZooKeeper integration for cluster coordination and configuration.
Apache Solr Cloud Architecture
Solr is helps to enable the subset of optional features and also simplifying horizontal scaling a search index using shard and replicating. Solr distributed cloud is mainly via distributed indexing side. Single Solr server modes are really fast and it has more features. High scalability. It counts Number of requests comes and updates the queries come. If we reached a maximum size of a single server, we have to add another server, so Solr designing cloud. Procedure to add the other server is explained below. Put some documents to another server, it allows the replicate the data.
Leader: A node that can accept writes without consulting another node. So, any node is basically a leader they determined any latency and accepting requests are updating.
Is everyone a Leader?
- Favors write availability
- Challenges optimistic locking
- Challenges consistency
Favors write availability: We need only any given node to be up in its going to accept our right arm to the downside to it makes.
Challenges optimistic locking: Optimistic locking more difficult, so when everybody’s leader what’s nice is if our notes get partitioned so they have got 170 is over, if both clusters except right, so if we want to do something like optimistic locking, it sending an update and we say which version we are trying to update and we should confirm whether we could update it or not. If we have separate partitions, it’s not easy to get information back immediately, because one partition and the other partition may be the one that has the document, met our trying update and its come back together.
Challenges consistency: We know units it’s an eventual consistency model. Primary reasons to offers optimistic locking is a form of transaction in their we can have form atomic updates on its per single document and a transaction that involves multiple documents.
Collections: Collection made of one or more Solr cores, single core contains single Solr instances, Collection of shard1 and shard2. Each of the shard is placed on two Solr instances.
Zoo Keeper: It is a vital part of Solr cloud. If the Zoo Keeper will fail the whole cluster become useless Leader election, cluster state management and centralized configuration are provided by the distributed coordination service. We can use the embedded Zoo Keeper for testing.
Collection: Each collection has a name, shard count and replication factor in distributed search index across multiple nodes.
Replication factor: Number of copies of a document in a collection.
Shard: It is a collection of logical slices, While the ability to shard a logical Solr index is an excellent feature.
Every shard has a name, leader, hash range and replication factor. Each Shard will contain at least one Leader Core and zero to many Replica Cores, One document is assigned to one shard per collection using a hash based document routing strategy.
Replica: It is a Core that stores a copy of a Leader Core’s index, each replica is implemented by Solr core. Replica Cores and other Leader Cores are dependable for forwarding the Solr Document to the appropriate Leader Core.
This blog will help you get a better understanding of Solr + Hadoop = Big Data Love!