Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.4k points)

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col
but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

1 Answer

0 votes
by (32.3k points)

It actually exists. It is just not defined explicitly. Functions that we export from pyspark.sql.functions are thin wrappers around JVM code, with a few exceptions which require special treatment, and these functions are generated automatically using helper methods.

If you check the source properly, you'll find col listed among other _functions. Afterwards, this dictionary is iterated and the _create_function is used to generate wrappers. The direct allocation of each generated function is done to a corresponding name in the globals.

Finally __all__, which basically defines a list of items exported from the module, exports all globals excluding ones contained in the blacklist.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...