What is Storm?
Storm is a distributed, real-time computation system.
On a Storm cluster, you execute topologies, which process streams of tuples (data).
Each topology is a graph consisting of spouts (which produce tuples) and bolts (which transform tuples).
Storm takes care of cluster communication, fail-over and distributing topologies across cluster nodes.
Use Cases Storm
Processing of Streams.
RPC (Remote Procedure Call) distributed.
Difference between Hadoop and storm
1. Simple to program
If you’ve ever tried doing real-time processing from scratch, you’ll understand how painful it can become. With Storm, complexity is dramatically reduced.
2. Support for multiple programming languages
It’s easier to develop in a JVM-based language, but storm supports any language as long as use or implement a small intermediary library.
The Storm cluster takes of workers going down reassigning tasks when necessary.
All you need to do in order to scale is add more machines to the cluster. Storm will reassign to new machines as they became available.
All messages are guaranteed to be processed at least once. If there are errors, messages might be processed more than once, but you’ll never lose any message.
Speed was one of the key factors driving storm’s design.
You can get exactly once messaging semantics for pretty much any computation.