Data Science Quiz For Practice

Welcome to your Data Science Quiz

How would you access the ‘StreamingTV’ column from the ‘customer_churn’ data.frame?

customer_churn@StreamingTV

customer_churn#StreamingTV

customer_churn$StreamingTV

customer_churn&StreamingTV

How would you create a list consisting of these elements: 100, ‘sparta’, TRUE

List(100,’sparta’,TRUE)

list(100,’sparta’,TRUE)

list(c(100,’sparta’,TRUE))

list(list(100,’sparta’,TRUE))

How would you get the last 100 records from the ‘customer_churn’ dataframe

last(100)

tail(100)

last(customer_churn,100)

tail(customer_churn,100)

How would you give a discount of 33% to the 5th cell of ‘MonthlyCharges’ column?

customer_churn$MonthlyCharges[5]*(.33)

customer_churn$MonthlyCharges[5]*(33)

customer_churn$MonthlyCharges[5]*(.67)

customer_churn$MonthlyCharges[5]*(67)

Which of these is the correct code to get the count of number of customers whose ‘MonthlyCharges’ is greater than 100

count=0 for(iin1:nrow(customer_churn)){ if(customer_churn$MonthlyCharges[i]>100){ count=count+1 } } count

count=0 i=1 while(i100){ count=count+1 } } count

count=1 for(i in 1:nrow(customer_churn)){ if(customer_churn$MonthlyCharges[i]>100){ count=count+50 } } count

count=0 if(customer_churn$MonthlyCharges[i]>100){ count=count+1

How would you extract only the female customers from the ‘customer_churn’ data.frame?

sqldf (select gender='Female' from customer_churn)

sqldf(select from customer_churn where gender=='Female')

sqldf(select * from customer_churn where gender=='Female')

sqldf(select * from customer_churn where gender in 'Female')

How would you extract a random sample of 33 records from the ‘customer_churn’ dataframe?

sample(customer_churn,33)

sample_frac(customer_churn,.33)

sample_frac(customer_churn,33)

sample_n(customer_churn,33)

How would you select the 3rd, 4th& 5th columns from the ‘customer_churn’dataframe?

select(customer_churn,(3,4,5))

select(customer_churn,3,4,5)

select(customer_churn,list(3,4,5))

select(3:5,customer_churn)

How would you get a summarized result for the mean of ‘MonthlyCharges’ grouped w.r.t ‘PaymentMethod’?

summarise(group_by(customer_churn,PaymentMethod),mean_MC=mean(MonthlyCharges))

group_by(mean_MC=mean(MonthlyCharges),summarise(customer_churn,PaymentMethod))

group_by(summarise(customer_churn,PaymentMethod),mean_MC=mean(MonthlyCharges))

summarise(group_by(PaymentMethod),mean_MC=mean(customer_churn,MonthlyCharges))

How would you extract those customers who have subscribed to both ‘StreamingTV’ & ‘StreamingMovies’?

filter(customer_churn, StreamingTV=='Yes' &StreamingMovies=='Yes')

filter(customer_churn, StreamingTV=='Yes' &&StreamingMovies=='Yes')

filter(customer_churn, StreamingTV=='Yes' andStreamingMovies=='Yes')

filter(customer_churn, StreamingTV=='Yes' AND StreamingMovies=='Yes')

To which of these geometries can you add the facet_grid()?

geom_bar()

geom_histogram()

geom_point()

All of the above

Which of these is the correct code to make a box-plot between the ‘tenure’& the ‘DeviceProtection’ columns. ‘tenure’ should be mapped on the y-axis & ‘DeviceProtection’ should be mapped on the x-axis. The fill color should be determined by the ‘DeviceProtection’ column

ggplot(data = customer_churn,aes(y=tenure,x=DeviceProtection,fill=DeviceProtection))+geom_boxplot()

ggplot(data = customer_churn,aes(y=tenure,x=DeviceProtection,fill='DeviceProtection'))+geom_boxplot()

ggplot(data = customer_churn,aes(y=tenure,x=DeviceProtection))+geom_boxplot(fill=DeviceProtection)

ggplot(data = customer_churn,aes(y=tenure,x=DeviceProtection))+geom_boxplot(col=DeviceProtection)

How would you make a histogram for the ‘tenure’ column, with the plotly package? The color of the bins should be determined by ‘Churn’ column

plot_ly(data = customer_churn,x=tenure,type='histogram', fill = ~ Churn)

plot_ly(data = customer_churn,x='tenure',type='histogram', color = 'Churn')

plot_ly(data = customer_churn,x=~tenure,type='histogram', color = ~ Churn)

None of the above

Which of these is the correct code to make a histogram for the ‘tenure’ column. The fill color of the bins should be ‘azure’& the number of bins should be 87

ggplot(data = customer_churn,aes(x=tenure,fill='azure'))+geom_histogram(bins=87)

ggplot(data = customer_churn,aes(x=tenure))+geom_histogram(fill='azure',bins=87)

ggplot(data = customer_churn,aes(x=tenure))+geom_histogram(col='azure',bins=87)

ggplot(data = customer_churn,aes(x=tenure,col='azure'))+geom_histogram(bins=87)

Which of these is the correct code to make a bar-plot for the ‘OnlineBackup’ column. The color of the bars should be determined by the ‘PhoneService’ column

ggplot(data = customer_churn,aes(x=OnlineBackup,fill=PhoneService))+geom_bar()

ggplot(data = customer_churn,aes(x=OnlineBackup))+geom_bar(fill=PhoneService)

ggplot(data = customer_churn,aes(y=OnlineBackup,fill=PhoneService))+geom_bar()

ggplot(data = customer_churn,aes(fill=OnlineBackup,x=PhoneService))+geom_bar()

Which of these is the correct code to make a scatter-plot between the ‘TotalCharges’& the ‘tenure’ columns. ‘TotalCharges’ should be mapped on the y-axis & ‘tenure’ should be mapped on the x-axis. The color of the points should be ‘yellow’

ggplot( data = customer_churn, aes( y = TotalCharges,x=tenure))+geom_point(fill='yellow')

ggplot(data = customer_churn,aes(x=TotalCharges,y=tenure))+geom_point(fill='yellow')

ggplot(data = customer_churn,aes(y=TotalCharges,x=tenure,col='yellow'))+geom_point()

ggplot(data = customer_churn,aes(y=TotalCharges,x=tenure))+geom_point(col='yellow')

Which of these is the correct code to make a bar-plot for the ‘TechSupport’ column. The color of the bars should be ‘blue’ & the title of the plot should be ‘Distribution of Tech Support’

plot(customer_churn$TechSupport,color='blue',title='Distribution of Tech Support')

plot(customer_churn$TechSupport,col='blue',title='Distribution of Tech Support')

plot(customer_churn$TechSupport,col='blue',main='Distribution of Tech Support')

plot(customer_churn$TechSupport,fill='blue',main='Distribution of Tech Support')

How would you build a linear model where the dependent variable is ‘MonthlyCharges’ & the independent variables are ‘tenure’, ‘PaymentMethod’ & ‘Contract’

lm(MonthlyCharges~tenure+PaymentMethod+Contract, data=customer_churn)

lm(MonthlyCharges=tenure,PaymentMethod,Contract, data=customer_churn)

lm(MonthlyCharges=tenure+PaymentMethod+Contract, data=customer_churn)

Im(MonthlyCharges~tenure,PaymentMethod,Contract + data=customer_churn)

sample.split() function is a part of which package?

tree

caret

randomForest

caTools

Which function is used to create the ROC curve?

ROC()

Predict()

Performance()

Roc_plot()

How would you create a simple logistic regression model where the dependent variable is ‘gender’ & the independent variable is ‘Monthly Charges’?

lm(gender=MonthlyCharges, data= customer_churn, family='binomial')

glm(gender~MonthlyCharges, data= customer_churn, family='logistic')

glm(gender~MonthlyCharges, data= customer_churn, family='binomial')

glm(gender~MonthlyCharges, data= customer_churn)

How would you build a decision tree model where the dependent variable is ‘Churn’ & the independent variables are ‘tenure’, ‘InternetService’ & ‘OnlineBackup’

decision_tree(Churn~tenure+InternetService+OnlineBackup, data=customer_churn)

tree(Churn~tenure+InternetService+OnlineBackup, data=customer_churn)

decision_tree(Churn~tenure+InternetService+OnlineBackup)

tree(Churn~tenure+InternetService+OnlineBackup)

How would you build a random forest model where the dependent variable is ‘Churn’ & the independent variable is ‘MonthlyCharges’. The number of trees in the model should be 100

randomForest(Churn~MonthlyCharges, data=customer_churn, trees=100)

randomForest(Churn=MonthlyCharges, data=customer_churn, tree=100)

Forest(Churn~MonthlyCharges, data=customer_churn,ntree=100)

randomForest(Churn~MonthlyCharges,data=customer_churn,ntree=100)

What is the minimum no. of variables/ features required to perform clustering?

Which of the following algorithm is most sensitive to outliers?

K-means clustering algorithm

K-medians clustering algorithm

K-modes clustering algorithm

K-medoids clustering algorithm

Which of the following are true? Clustering analysis is negatively affected by multicollinearity of features Clustering analysis is negatively affected by heteroscedasticity

1 only

2 only

1 and 2

None of them

Which of the following is a bad characteristic of a dataset for clustering analysis

Data points with outliers

Data points with different densities

Data points with non-convex shapes

All of the above

Every iteration of the K-Means algorithm contains which of the following steps:

Randomly assigning all data-points to one of K clusters.

Randomly assigning the positions of K centroids in the data-point space.

Check if the average squared distance between all data-points and all centroids is decreasing.

Assigning data-points to the closest centroid using a given similarity(distance) measure.

A data scientist is asked to implement an article recommendation feature for an on-line magazine. The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current article is available for making recommendations. All of the magazine’s articles are stored in a database in a format suitable for analytics. Which method should the data scientist try first?

K Means Clustering

Logistic Regression

Association Rules

Imagine you are trying to hire a Data Scientist for your team. In addition to technical ability and quantitative background, which additional essential trait would you look for in people applying for this position?

Communication skill

Scientific background

Domain expertise

Well Organized

Take the Free Data Science Quiz

Free Practice Test