0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I am using Spark 1.3.0 and Spark Avro 1.0.0. I am working from the example on the repository page.

This following code works well:

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")


But what if I needed to see if the doctor string contains a substring?

1 Answer

0 votes
by (31.4k points)

You can use contains (this works with an arbitrary sequence):

Note: do import:  import sqlContext.implicits._

df.filter($"foo".contains("bar"))

like (SQL like with SQL simple regular expression with _ matching an arbitrary character and % matching an arbitrary sequence):

df.filter($"foo".like("bar"))

or rlike (like with Java regular expressions):

df.filter($"foo".rlike("bar"))

depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...