You don't want a load balancer — Elasticsearch is already providing that functionality. You'd just another component, that might misbehave and which might add an excess network hop.
ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case, 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for equal distribution.
By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you're using six shards, it could look something like this (R is a replicated shard):
- node0: 1, 4, R3, R6
- node1: 2, 6, R1, R5
- node2: 3, 5, R2, R4
Assuming node1 dies, the cluster would change to the following setup:
- node0: 1, 4, 6, R3 + new replicas R5, R2
- node2: 3, 5, 2, R4 + new replicas R1, R6
Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client, you may avoid double hops, since you'll always connect to the correct shard/index. With the transport client, your requests will be routed to the correct instance.
So there is nothing to load balance for yourself, you'd simply add overhead. The auto-clustering is probably ES's greatest strength.