Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (140 points)
edited by
what is nullable = true or false in spark scala? where need to use this?

3 Answers

0 votes
by (11.3k points)
edited by

Spark uses a simple set of rules to determine nullable property when creating a Dataset from a statically typed structure,

  • If an object of the given type can be null then its DataFrame representation is nullable.
  • If object is an Option[_] then its DataFrame representation is nullable with None considered to be SQL NULL.
  • Otherwise, it will be marked as not nullable.

If you want to learn PySpark, you can check ouPySpark Certification by Intellipaat.

0 votes
by
edited by

When you create your schema and define the field information, nullable is an option,  by default, it is true which means the field which you have created can not be null. Null or not null is dependent on your use case suppose you are consuming customer data and in your data structure you have the customer's first name that cannot be null hence nullable is true.

So when you get a record with the customer's first name as null an error will be thrown by spark job.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala

0 votes
by (1.1k points)

 In Spark, nullable = true or false is a property that indicates whether a column can contain a null value or not in a DataFrame Schema.

1. nullable = true: Allows null values in the column.

2. nullable = false: Prevent null values in the column.

E.g StructField("id", IntegerType, nullable = false),StructField("name", StringType, nullable = true)

1.2k questions

2.7k answers

501 comments

693 users

Browse Categories

...