0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I'm trying to use spark-submit to execute my python code in spark cluster.

Generally we run spark-submit with python code like below.

# Run a Python application on a cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  my_python_code.py \
  1000


But I wanna run my_python_code.pyby passing several arguments Is there smart way to pass arguments?

1 Answer

0 votes
by (31.4k points)

Given below is a proper way to handle line commands args in PySpark jobs:

import argparse

parser = argparse.ArgumentParser()

parser.add_argument("--ngrams", help="some useful description.")

args = parser.parse_args()

if args.ngrams:

    ngrams = args.ngrams

Now, you can easily launch your job as follows:

spark-submit job.py --ngrams 3

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...