I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.
Here the best solution to this problem is to use yarn and set –conf spark.yarn.executor.memoryOverhead=600, alternatively when cluster using mesos, try this –conf spark.mesos.executor.memoryOverhead=600 instead.
The configuration option for spark 2.3.1+ is
In this problem, there was insufficient memory for YARN itself and containers were being killed because of it. But after setting the configuration option as mentioned above, you will no longer encounter lost executor problem.