I would suggest you to use wildcard, e.g. just replace 2019 with *:
(PySpark v2.3):
df = sqlContext.read
.option("header", "true")
Hopefully, this will work fine for you.
Another approach:
(Spark 2.x)For Example, Let's say you have 3 directories holding csv files:
dir1, dir2, dir3
You then define paths as a string of comma delimited list of paths as follows:
paths = "dir1/,dir2/,dir3/*"
Then use the following function and pass this path's variable to it:
def get_df_from_csv_paths(paths):
df = spark.read.format("csv").option("header", "false").\
option('delimiter', '\t').\
option('mode', 'DROPMALFORMED').\
return df
By then running:
df = get_df_from_csv_paths(paths)
Now, you have a single spark dataframe containing the data from all the CSVs found in these 3 directories.