Before you start your CCA Spark and Hadoop Developer exam (CCA-175) preparation, let’s learn everything you can expect from this certification.
Check out the video on PySpark Course to learn more about its basics:
Cloudera’s 2020 CCA-175 update got rid of the legacy technologies and has designed a Spark-oriented exam, which is more data-engineering appropriate for the current day and age.
What is CCA-175 Spark and Hadoop Developer Certification?
If you want to set yourself as an expert and certified Apache Hadoop Developer, with an extremely strong grasp of the current Hadoop development protocols and advanced operational procedures and tools, then CCA-175 is the right certification to pursue. The CCA-175 certification training curriculum covers Apache Hadoop with a focus on Spark and Scala.
The CCA-175 certification program has the following areas:
- Apache Hadoop
- Apache Spark
- Scala programming language
Apache Hadoop
Apache Hadoop is a set of open-source software utilities that make it possible for a network of computers to be used for problem-solving, which involves large amounts of data and computation. It comes with a software framework for the purpose of distributed storage and the processing of big data using the MapReduce programming model.
Apache Spark
Apache Spark is an open-source data processing tool and a unified analytics engine that is used for large-scale data processing. It provides an interface to program entire clusters with fault tolerance and implicit data parallelism. It is used on top of the Apache Hadoop Distributed File System (HDFS).
Enroll in Intellipaat’s Spark Course and gain mastery over Big Data.
Scala Programming Language
Scala is a general-purpose programming language developed using Java and is used to perform data processing commands in Spark on top of Hadoop.
CCA-175 Spark and Hadoop Developer Certification Syllabus
You will be required to have the following skills to become a certified Apache Hadoop Developer:
Transform, stage, and store: Converting the datasets stored in HDFS in a particular format into a new format or data values and writing them onto HDFS
- Get the data from HDFS to use it in Spark applications
- Using Spark, write the results back onto HDFS
- Read and write files in different file formats
- Using the Spark API, perform standard ETL processes on data
Data analysis: Interacting programmatically with the metastore in applications using Spark SQL and generating reports with the help of queries against the loaded data
- Utilize metastore tables as an input source or an output sink for Spark applications
- Have knowledge of dataset querying in Spark
- Perform data filtering with the help of Spark
- Create queries for the calculation of aggregate statistics
- Using Spark, join disparate datasets
- Generate sorted or ranked data
Configuration: Having complete knowledge and expertise in everything about result generation, apart from coding
- Provide command-line options for changing application configuration like optimizing available memory
Get 100% Hike!
Master Most in Demand Skills Now!
Before checking out this exam format, please note that there have been no exams conducted after the 2020 update, and this is the old Cloudera Hadoop and Spark Developer Certification Exam (CCA175) format.
Number of Questions
There are about 8 to 12 questions in the CC-175 certification exam, and those are performance-based (practical) tasks on the Cloudera Enterprise cluster.
Exam Duration
The duration of the CC-175 certification exam is 120 minutes.
Passing Score
Candidates will have to score a minimum of 70 percent to successfully clear Cloudera’s CCA-175 certification exam and get certified.
During the exam, all websites and Google/search functionality will be disabled and so will be the access to Spark external packages. You will not be allowed to use any exam aids or notes as well.
Go through our Spark Cheat Sheet for reference of spark concepts.
Spark 2.4 is now provided by default within the exam environment, which is accessible via both Spark Shell (for Scala) and PySpark (for Python). It can also be accessed via spark-submit using scripts while giving the exam. Spark 2.4 has a set of very useful features that makes it easier to perform well in the exam, especially when working with Avro files.
CCA-175 Spark and Hadoop Developer Certification Exam Sample Question
Here is a sample from the CCA 175 exam questions:
Problem statement: Create data or files from the given dataset:
- Create a case class named Learner with the following column names:
- Create an RDD from the given data using the defined case class
- Convert the RDD into a DataFrame
- Save this DataFrame data in the directory ‘spark6/learner.parquet’ and ensure that the file format is Parquet
- Read back the saved file and create a DataFrame and show the contents
Data for this exercise:
– Learner Students –
"Amit", "[email protected]", "Mumbai"
"Rakesh", "[email protected]", "Pune"
"Jonathan", "[email protected]", "NewYork"
"Michael", "[email protected]", "Washington"
"Simon", "[email protected]", "HongKong"
"Venkat", "[email protected]", "Chennai"
"Roshni", "[email protected]", "Banglore"
Solution:
Step-1: Creating the Learners case class object
case class Learner(name : String , email : String , city :String)
Step-2: Creating the instances of Learners and adding them to an array
val heData = Array(
Learner("Amit" , "[email protected]", "Mumbai"),
Learner("Rakesh" , "[email protected]", "Pune"),
Learner("Jonathan" , "[email protected]", "NewYork"),
Learner("Michael" , "[email protected]", "Washington"),
Learner("Simon" , "[email protected]", "HongKong"),
Learner("Venkat" , "[email protected]", "Chennai"),
Learner("Roshni" , "[email protected]", "Banglore")
)
Step-3: Creating an RDD from the raw data
val heRDD = sc.parallelize(heData)
Step-4: Creating a DataFrame from the RDD
val heDF = spark.createDataFrame(heRDD)
heDF.show()
Step-5: Saving the DataFrame data in the Parquet format
heDF.write.parquet("spark6/learner.parquet")
Step-6: Reading back the saved Parquet file as a DataFrame and displaying the result
val heParquetDF = spark.read.parquet("spark6/learner.parquet")
heParquetDF.show()
Step-7: You must verify that the data has been saved because the evaluators will check that as well. You can use the hdfs command utility for this purpose. You can see that it is using the snappy codec by default.
hdfs dfs -ls spark6/learner.parquet
Above is just an example of the type of CCA-175 questions that may come in the official exam.
Receive your Big Data Hadoop Training from Intellipaat and clear the exam on the first attempt!
CCA Spark and Hadoop Developer Certification Cost
The CCA-175 Spark and Hadoop Developer certification exam costs US$295. You can purchase the same here. Each user will be given a CDH6 (currently, 6.1.1) cluster pre-loaded with Spark 2.4.
Job Opportunities and Salary Trends for Hadoop Developers
In India:
- There are 13,000+ job openings for Hadoop Developers on LinkedIn
- Over 36,400 jobs are listed for Hadoop Developers on TimesJobs
In the United States:
- There are over 21,000 jobs open for Hadoop Developers on LinkedIn
- Over 52,500 jobs are available on ZipRecruiter for Hadoop Developers
According to Glassdoor:
- The average annual salary of a Hadoop Developer in India is ₹524,600
- The average annual salary of a Hadoop Developer is US$76,500 in the United States
Conclusion
Hope this blog has given you an idea of how you can start your Cloudera Hadoop and Spark Developer Certification Exam (CCA175) preparation and what to expect. Now, all you need to do is study the full course under the guidance of experts and utilize the knowledge and skills in practical industry-based projects for more hands-on experience as only that will get you through the exam smoothly. Intellipaat’s CCA-175 certification training is just the thing that will give you a head start.