Is using a load balancer with ElasticSearch unnecessary?

Question

asked Jul 4, 2019 in AWS by yuvraj (19.1k points)
edited Jul 4, 2019 by yuvraj

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed).

From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things:

Point your client at the URL/IP of one node, let ES do the load balancing for you and hope that node never goes down.
Hard-code the URLs/IPs of ALL your nodes into your client app and have the app handle the failover logic.

My background is mostly in web farms where it's just common sense to create a huge pool of autonomous web servers, throw an ELB in front of them and let the load balancer decide what nodes are alive or dead. Why does ES not seem to support this same architecture?

1 Answer

kodee · Answer 1 · 2019-07-04T13:40:57+0000

You don't want a load balancer — Elasticsearch is already providing that functionality. You'd just another component, that might misbehave and which might add an excess network hop.

ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case, 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for equal distribution.

By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you're using six shards, it could look something like this (R is a replicated shard):

node0: 1, 4, R3, R6
node1: 2, 6, R1, R5
node2: 3, 5, R2, R4

Assuming node1 dies, the cluster would change to the following setup:

node0: 1, 4, 6, R3 + new replicas R5, R2
node2: 3, 5, 2, R4 + new replicas R1, R6

Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client, you may avoid double hops, since you'll always connect to the correct shard/index. With the transport client, your requests will be routed to the correct instance.

So there is nothing to load balance for yourself, you'd simply add overhead. The auto-clustering is probably ES's greatest strength.

Is using a load balancer with ElasticSearch unnecessary?

1 Answer

Related questions

Browse Categories