0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col
but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

1 Answer

0 votes
by (32.3k points)

It actually exists. It is just not defined explicitly. Functions that we export from pyspark.sql.functions are thin wrappers around JVM code, with a few exceptions which require special treatment, and these functions are generated automatically using helper methods.

If you check the source properly, you'll find col listed among other _functions. Afterwards, this dictionary is iterated and the _create_function is used to generate wrappers. The direct allocation of each generated function is done to a corresponding name in the globals.

Finally __all__, which basically defines a list of items exported from the module, exports all globals excluding ones contained in the blacklist.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !