Accumulators are variables that are used for aggregating information across the executors.
They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types. If accumulators are created with a name, they will be displayed in Spark’s UI. This can be useful for understanding the progress of running stages (NOTE − this is not yet supported in Python).
An accumulator is created from an initial value v by calling SparkContext.accumulator(v). Tasks running on the cluster can then add to it using the add method or the += operator (in Scala and Python). However, they cannot read its value. Only the driver program can read the accumulator’s value, using its value method.
The code given below shows an accumulator being used to add up the elements of an array −
scala> val accum = sc.accumulator(0)
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
res2: Int = 10
Answering your question "When are accumulators truly reliable ?"
Accumulators re truel reliable when they are present in an Action operation.
In an Action Task, even if any restarted tasks are present it will update Accumulator only once.
For accumulator updates are performed inside actions only, Spark guarantees that each task’s update will be applied to the accumulator only once.
Accumulator updates are sent back to the driver when a task is successfully completed. So your accumulator results are guaranteed to be correct when you are certain that each task will have been executed exactly once and each task did as you expected.