In Spark, RDD can be created using parallelizing, referencing an external dataset, or creating another RDD from an existing RDD.
Here is an example how to create a RDD using Parallelize() method:
from pyspark import SparkContext
words = spark.sparkContext.parallelize ( ["Spark", "is", "easy", "and", "awesome"])
count_words = words.count ( )
print("Number of elements in RDD”, (count_words))
Here is an example to load a dataset and create RDD:
val dataRDD = spark.read.csv("path_of_csv/file").rdd #For csv file
val dataRDD = spark.read.json("path_of_json/file").rdd #For json file
val dataRDD = spark.read.textFile("path_of_text/file").rdd #For text file
Here is an example to create an RDD from an existing RDD:
val rdd1=spark.sparkContext.parallelize(Seq( ["Spark", "is", "easy", "and", "awesome"])
val rdd_new= rdd1.map(w => (w.charAt(0), w))
rdd_new.foreach(println)
If you are interested in to learn Spark, I recommend this Spark Certification by Intellipaat.
You can watch this video to understand more about creating Spark RDD: