Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.

logging.info("This is an informative message.")
logging.debug("This is a debug message.")


I want to use the same logger that Spark is using so that the log messages come out in the same format and the level is controlled by the same configuration files. How do I do this?

I've tried putting the logging statements in the code and starting out with a logging.getLogger(). In both cases I see Spark's log messages but not mine. I've been looking at the Python logging documentation, but haven't been able to figure it out from there.

1 Answer

0 votes
by (32.3k points)

You should try to get the logger for spark itself, by default getLogger(), it will return the logger for you own module. Try something like:

logger = logging.getLogger('py4j')

logger.info("My test info statement")

Here, you can also replace 'py4j' with ‘pyspark’..

Note: In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.

I would suggest you visit this link: https://gist.github.com/adgaudio/0191e14717af68bbba81 for more explanation.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...