Spark code organization and best practices

Question

asked Jul 10, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

So, having spend many years in an object oriented world with code reuse, design patterns and best practices always taken into account, I find myself struggling somewhat with code organization and code reuse in world of Spark.

If I try to write code in a reusable way, it nearly always comes with a performance cost and I end up rewriting it to whatever is optimal for my particular use case. This constant "write what is optimal for this particular use case" also affects code organization, because splitting code into different objects or modules is difficult when "it all really belongs together" and I thus end up with very few "God" object containing long chains of complex transformations. In fact, I frequently think that if I had taken a look at most of the Spark code I'm writing now back when I was working in the object oriented world, I would have winced and dismissed it as "spaghetti code".

I have surfed the internet trying to find some sort of equivalent to the best practices of the object oriented world, but without much luck. I can find some "best practices" for functional programming but Spark just adds an extra layer, because performance is such a major factor here.

So my question to you is, have any of you Spark gurus found some best practices for writing Spark code that you can recommend?

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T08:26:35+0000

I think you can subscribe Apache Spark, databricks channel on youtube, listen more and know more, especially for the experiences and lessons from others.

Now below are some of the recommended videos:

If you want to know more about Spark, then do check out this awesome video tutorial:

Spark code organization and best practices

1 Answer

Related questions

Browse Categories