The spark.sql.shuffle.partitions configures the number of partitions that are used when shuffling data for joins or aggregations.
The spark.default.parallelism is the default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set explicitly by the user. But the spark.default.parallelism seems to only be working for raw RDD and is ignored when working with data frames.
If your task doesn't need a join or aggregation and you are working with data frames then setting these will not have any effect. You set the number of partitions yourself by calling df.repartition(numOfPartitions).
./bin/spark-submit --conf spark.sql.shuffle.partitions=300 --conf spark.default.parallelism=300
Hope this answer helps you!
Learn Spark with this Spark Certification Course by Intellipaat.
You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized SQL training online.