Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)
edited by

I have a Spark Streaming job which has been running continuously. How do I stop the job gracefully? I have read the usual recommendations of attaching a shutdown hook in the job monitoring and sending a SIGTERM to the job.

sys.ShutdownHookThread {
  logger.info("Gracefully stopping Application...")
  ssc.stop(stopSparkContext = true, stopGracefully = true)
  logger.info("Application stopped gracefully")
}


It seems to work but does not look like the cleanest way to stop the job. Am I missing something here?

From a code perspective it may make sense but how do you use this in a cluster environment? If we start a spark streaming job (we distribute the jobs on all the nodes in the cluster) we will have to keep track of the PID for the job and the node on which it was running. Finally when we have to stop the process, we need to keep track which node the job was running at and the PID for that. I was just hoping that there would be a simpler way of job control for streaming jobs.

1 Answer

0 votes
by (32.3k points)

To stop your streaming context in the cluster mode without sending a SIGTERM, execute the command given below:(This will stop the streaming context without you needing to explicitly stop it using a thread hook.)

$SPARK_HOME_DIR/bin/spark-submit --master $MASTER_REST_URL --kill $DRIVER_ID

-$MASTER_REST_URL is the rest url of the spark driver, ie something like spark://localhost:6066

-$DRIVER_ID is something like driver-20150915145601-0000

Also, in order to you to stop your app gracefully, you can try setting the following system property when your spark app is initially submitted (see http://spark.apache.org/docs/latest/submitting-applications.html on setting spark configuration properties).

Set spark.streaming.stopGracefullyOnShutdown parameter to be true (default is false). 

I would strongly recommend you to check this post

Browse Categories

...