PySpark is the marriage between Python and Apache Spark. It combines the simplicity of Python with the powerful data processing engine, Apache Spark. Now, when should you use PySpark? You might need to process huge datasets, could have Machine Learning algorithms to sort, or structured data processing API. Out of all the other options like Apache Storm, Apache Flink, Hadoop, Dask, PySpark is extensively used by Data Scientists, when they must deal with a huge number of datasets or machine learning algorithms. And added to that learning Python is very easy, and is very easy to use, and will find an enormous number of libraries to work with.
If you wish to get started, with PySpark, then have a look at the PySpark tutorial, and if you wish to get certified in it, then check out the PySpark training course from Intellipaat. This course offers you industry-grade training along with guided projects to help you gain sufficient practical experience. If you are just starting out in this domain, then watch the following video on PySpark Course to get your fundamentals correct.