Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm trying to use spark-submit to execute my python code in spark cluster.

Generally we run spark-submit with python code like below.

# Run a Python application on a cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  my_python_code.py \
  1000


But I wanna run my_python_code.pyby passing several arguments Is there smart way to pass arguments?

1 Answer

0 votes
by (32.3k points)

Given below is a proper way to handle line commands args in PySpark jobs:

import argparse

parser = argparse.ArgumentParser()

parser.add_argument("--ngrams", help="some useful description.")

args = parser.parse_args()

if args.ngrams:

    ngrams = args.ngrams

Now, you can easily launch your job as follows:

spark-submit job.py --ngrams 3

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...