DataFrames is a distributed collection of data organized as columns with the column names and types info. In addition, we can say data in dataframe is as same as the table in a relational database or a data frame in R/Python. The execution in DataFrame is lazy triggered (similar to RDD). It allows data processing in several formats such as AVRO, CSV, JSON, and storage system HDFS, HIVE tables, MySQL.
Datasets are an extension of DataFrames. Actually, it earns strongly typed and untyped APIs characteristics. Datasets are a collection of strongly typed JVM objects by default whereas, in dataframes, it is not. Also, it uses Spark’s Catalyst optimizer to reveal expressions & data field to a query planner. Dataset also supports data from different sources.
If you wish to learn Spark then check out this Spark Course by Intellipaat that offers instructor-led training, hands-on projects, and certification.
Also, watch this video on Spark DataFrames: