Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)
edited by

I have a dataframe of

date, string, string

I want to select dates before a certain period. I have tried the following with no luck

 data.filter(data("date") < new java.sql.Date(format.parse("2015-03-14").getTime))

I'm getting an error stating the following

org.apache.spark.sql.AnalysisException: resolved attribute(s) date#75 missing from date#72,uid#73,iid#74 in operator !Filter (date#75 < 16508);

As far as I can guess the query is incorrect. Can anyone show me what way the query should be formatted?

1 Answer

0 votes
by (32.3k points)

Since spark 1.5, you can do the following:

For lower than :

// do this to filter the data where the date is lesser than 2015-03-14


For greater than :

// do this to filter the data where the date is greater than 2015-03-14


For equality, you can use either equalTo or === :

data.filter(data("date") === lit("2015-03-14"))

If your DataFrame date column is of type StringType, you can convert it using the to_date function :

// do this to filter data where the date is greater than 2015-03-14


You may also filter according to a year using the year function :

// do this to filter the data where year is greater or equal to 2016


Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers


108k users

Browse Categories