Explore Online Courses
Free Courses
Hire from us
Become an Instructor
Reviews
All Courses
Submit
Submit
Take the Free Practice Test
Free Practice Test
Instructions:
FREE test and can be attempted multiple times.
60 Minutes
30 Multiple Choice Questions
Fill in the Details to Get Started
Select your preference
Self-learning and knowledge validation
Completed a course & revising
Just curious
By providing your contact details, you agree to our
Terms of Use
&
Privacy Policy
Welcome to your Spark Quiz
Which of the following is correct for Spark ?
Product is already 5 years old
It runs programs in-memory up to 100x faster than MapReduce
It offers over 80 high level operators
Can be used from Scala and Python shells
rdd = { 1,2,4,3} what is the output of rdd.reduce((x,y)=>x+y) ?
9
10
11
None of the above
rdd = { (1, 2), (3, 4), (3, 6)} what is the output of rdd.flatMapValues(x => to(5)) ?
{(1,2), (1,3), (1,4), (1,5), (3,4), (3,5), (3,6)}
{(1,2), (1,4), (1,5), (3,4), (3,5)}
{(1,2), (1,3), (1,4), (1,5), (3,4), (3,5)}
None of the above
rdd = { (1, 2), (3, 4), (3, 6)} , rdd1={(3,9)} what is the output of rdd.join(rdd1) ?
{(1,2), (1,3), (1,4), (1,5), (3,9), (3,6)}
{(3, (4, 9)), (3,(6, 9))}
{(1,2), (1,3), (1,4), (1,5), (3,4), (3,6)}
None of the above
rdd = { (1, 2), (3, 4), (3, 6)} , rdd1={(3,9)} what is the output of rdd.rightOuterJoin(rdd1) ?
{(1,2), (1,3), (1,4), (1,5), (3,9), (3,6)}
{(3,(Some(4),9)),(3,(Some(6),9))}
{(1,(2,None)), (3,(4,Some(9))), (3,(6,Some(9)))}
None of the above
How are the RDDs are evaluated in spark platform ?
Sequentially
Lazily
Grouping the RDDs
All the above
Which are the file systems supported in spark?
Local/ Regular FS
Amazon S3
HDFS
All of the above
Discretized Streams of RDDs are part of ?
Spark Streaming
Spark SQL
Shark
Spark Core
Which transformations are possible in Spark Streaming ?
Stateful Transformations
Stateless Transformations
Windowed Transformations
UpdateStateByValue Transformations
During cluster mode, spark utilizes ________ architecture ?
peer to peer architecture
Master Slave architecture
Service Oriented architecture
None of the above
which of the following is/are a open source cluster manager ?
Yarn
Mesos
All of the above
None of the above
Which of the following is Not a component of Spark ?
Spark MLlib
Spark Streaming
Graphx
Giraph
perform sc.[\t] in REPL. Which of the following is not listed in the output ?
toString
hadoopRDD
isForeign
makeRDD
which are standard row and columnar formats respectively used to store data on Hadoop clusters ?
Avro
Paraquet
Both of the above
None of the above
Which of the following is a Unsupervised Machine Learning algorithm ?
Decision Forests
Naive Bayes
K-means Clustering
Logistic Regression
which are standard row and columnar formats respectively used to store data on Hadoop clusters ?
Avro
Paraquet
Both of the above
None of the above
Which of the following is a Unsupervised Machine Learning algorithm ?
Decision Forests
Naive Bayes
K-means Clustering
Logistic Regression
Who among the following offers commercial distribution of Apache Spark?
DataBricks
Cloudera
MapR
All of the above
what are the properties of RDD?
Immutable
Partitioned
Resilient
All the above
what are advantages of Using Apache Spark with Hadoop?
Stable API
Spark SQL component to access structured data
support for multiple languages
All the above.
which of the following is not a build tool?
Apache ant
Apache maven
awk
sbt
which of the following commonly supported file formats in spark is unstructured ?
json
csv
text
sequence files
Regarding to RDD, which are following statement is False ?
Basic abstraction in spark
Immutable collection of elements that operate in parallel
Provides lesser fault tolerance than HDFS
Transformations and Actions are possible with RDDs
Controlling datasets by partitioning across nodes is required because ?
Communication is very expensive
More network traffic can greatly improve performance
None of the above
All the above
which of the following are the features of DataFrames in spark ?
Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster
Support for a wide array of data formats and storage systems
Seamless integration with all big data tooling and infrastructure via Spark
All the above
Which of the following is a technique for Dimensionality Reduction ?
Collaborative Filtering
Principal Component Analysis
K-means
Linear Regression
Which of the following is used for both classification and regression ?
Support Vector Machines
Logistic Regression
Decision Trees
K-Means
which of the following are the key performance considerations while running a project on spark ?
Level of Parallelism
Serialization Format
Memory Management
All the above
which of the following is not a sparks shared variable?
Accumulators
Broadcast variables
receiver variables
None of the above
Time is Up!