Apache Spark Quiz For 2023

Welcome to your Spark Quiz

Which of the following is correct for Spark ?

Product is already 5 years old

It runs programs in-memory up to 100x faster than MapReduce

It offers over 80 high level operators

Can be used from Scala and Python shells

rdd = { 1,2,4,3} what is the output of rdd.reduce((x,y)=>x+y) ?

9

10

11

None of the above

rdd = { (1, 2), (3, 4), (3, 6)} what is the output of rdd.flatMapValues(x => to(5)) ?

{(1,2), (1,3), (1,4), (1,5), (3,4), (3,5), (3,6)}

{(1,2), (1,4), (1,5), (3,4), (3,5)}

{(1,2), (1,3), (1,4), (1,5), (3,4), (3,5)}

None of the above

rdd = { (1, 2), (3, 4), (3, 6)} , rdd1={(3,9)} what is the output of rdd.join(rdd1) ?

{(1,2), (1,3), (1,4), (1,5), (3,9), (3,6)}

{(3, (4, 9)), (3,(6, 9))}

{(1,2), (1,3), (1,4), (1,5), (3,4), (3,6)}

None of the above

rdd = { (1, 2), (3, 4), (3, 6)} , rdd1={(3,9)} what is the output of rdd.rightOuterJoin(rdd1) ?

{(1,2), (1,3), (1,4), (1,5), (3,9), (3,6)}

{(3,(Some(4),9)),(3,(Some(6),9))}

{(1,(2,None)), (3,(4,Some(9))), (3,(6,Some(9)))}

None of the above

How are the RDDs are evaluated in spark platform ?

Sequentially

Lazily

Grouping the RDDs

All the above

Which are the file systems supported in spark?

Local/ Regular FS

Amazon S3

HDFS

All of the above

Discretized Streams of RDDs are part of ?

Spark Streaming

Spark SQL

Shark

Spark Core

Which transformations are possible in Spark Streaming ?

Stateful Transformations

Stateless Transformations

Windowed Transformations

UpdateStateByValue Transformations

During cluster mode, spark utilizes ________ architecture ?

peer to peer architecture

Master Slave architecture

Service Oriented architecture

None of the above

which of the following is/are a open source cluster manager ?

Yarn

Mesos

All of the above

None of the above

Which of the following is Not a component of Spark ?

Spark MLlib

Spark Streaming

Graphx

Giraph

perform sc.[\t] in REPL. Which of the following is not listed in the output ?

toString

hadoopRDD

isForeign

makeRDD

which are standard row and columnar formats respectively used to store data on Hadoop clusters ?

Avro

Paraquet

Both of the above

None of the above

Which of the following is a Unsupervised Machine Learning algorithm ?

Decision Forests

Naive Bayes

K-means Clustering

Logistic Regression

which are standard row and columnar formats respectively used to store data on Hadoop clusters ?

Avro

Paraquet

Both of the above

None of the above

Which of the following is a Unsupervised Machine Learning algorithm ?

Decision Forests

Naive Bayes

K-means Clustering

Logistic Regression

Who among the following offers commercial distribution of Apache Spark?

DataBricks

Cloudera

MapR

All of the above

what are the properties of RDD?

Immutable

Partitioned

Resilient

All the above

what are advantages of Using Apache Spark with Hadoop?

Stable API

Spark SQL component to access structured data

support for multiple languages

All the above.

which of the following is not a build tool?

Apache ant

Apache maven

awk

sbt

which of the following commonly supported file formats in spark is unstructured ?

json

csv

text

sequence files

Regarding to RDD, which are following statement is False ?

Basic abstraction in spark

Immutable collection of elements that operate in parallel

Provides lesser fault tolerance than HDFS

Transformations and Actions are possible with RDDs

Controlling datasets by partitioning across nodes is required because ?

Communication is very expensive

More network traffic can greatly improve performance

None of the above

All the above

which of the following are the features of DataFrames in spark ?

Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster

Support for a wide array of data formats and storage systems

Seamless integration with all big data tooling and infrastructure via Spark

All the above

Which of the following is a technique for Dimensionality Reduction ?

Collaborative Filtering

Principal Component Analysis

K-means

Linear Regression

Which of the following is used for both classification and regression ?

Support Vector Machines

Logistic Regression

Decision Trees

K-Means

which of the following are the key performance considerations while running a project on spark ?

Level of Parallelism

Serialization Format

Memory Management

All the above

which of the following is not a sparks shared variable?

Accumulators

Broadcast variables

receiver variables

None of the above

Take the Free Practice Test

Free Practice Test