## Top Answers to R Interview Questions

R programming Language | Python programming language |

Model Building is similar to Python | Model Building is similar to R. |

Model Interpretability is good | Model Interpretability is not good |

Production is not better than Python. | Production is good |

R has good community support over Python. | Community Support is not better than R |

Data Science Libraries are same as Python. | Data Science Libraries are same as R. |

R has good data visualizations libraries and tools | Data visualization is not better than R |

R has a steep learning curve. | Learning Curve in Python is easier than learning R. |

R provides to import data in R language. To begin with the R commander GUI, user should type the commands in the command Rcmdr into the console. Data can be imported in R language in 3 ways such as:

- Select the data set in the dialog box or enter the name of the data set as required.
- Data is entered directly using the editor of R Commander via Data->New Data Set. This works good only when the data set is not too large.
- Data can also be imported from a URL or from plain text file (ASCII), or from any statistical package or from the clipboard.

Combine the data, code and analysis results in a single document using knitr for Reproducible research done. Helps to verify the findings, add to them and engage in conversations. Reproducible research makes it easy to redo the experiments by inserting new data values and applying it to different various problems.

library() | require() |

Library () function gives an error message display, if the desired package cannot be loaded. | Require () function is used inside function and throws a warning messages whenever a particular package is not Found |

It loads the packages whether it is already loaded or not, | It just checks that it is loaded, or loads it if it isn’t (use in functions that rely on a certain package). The documentation explicitly states that neither function will reload an already loaded package. |

Consider a related program for the above differentiation.

if(!require(package, character.only=T, quietly=T)) {

install.packages (package)

library(package, character.only=T)

}

For multiple packages you can use

for(package in c(”, ”)) {

if(!require(package, character.only=T, quietly=T)) {

install.packages (package)

library(package, character.only=T)

}

}

R is a programming language which is used for developing statistical software and data analysis. It is being increasingly deployed for machine learning applications as well.

By using # at the starting of the line of code like #division commands are written.

It is used to determine that the means of two groups are equal or not by using t.test() function.

The disadvantages are:-

- Lack of standard GUI
- Not good for big data.
- Does not provide spreadsheet view of data.

with() function applies an expression to a dataset.

#with(data,expression)

By() function applies a function t each level of a factors.

#by(data,factorlist,function)

In R missing values are represented by NA which should be in capital letters.

Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset.

Transpose is used for reshaping of the data which is used for analysis. Transpose is performed by t() function.

- The advantages are:-
- It is used for managing and manipulating of data.
- No license restrictions
- Free and open source software.
- Graphical capabilities of R are good.
- Runs on many Operating system and different hardware and also run on 32 & 64 bit processors etc.

Now that you are aware of the benefits of R programming, to know more check out R Programming for Data Science training.

For adding two datasets rbind() function is used but the column of two datasets must be same.

Syntax: rbind(x1,x2……) where x1,x2: vector, matrix, data frames.

Cor-relations is produced by cor() and covariances is produced by cov() function.

Dataframe can contain different type of data but matrix can contain only similar type of data.

lapply is used to show the output in the form of list whereas sapply is used to show the output in the form of vector or data frame

Seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4) means a vector of the length(4) or 1(c(1)).

rcmdr command is used to start the R commander GUI.

There are 5 data structure in R i.e. vector, matrix, array which are of homogenous type and other two are list and data frame which are heterogeneous.

Learn more about data structure in R programming tutorial.

There are two methods that is collapsing data by using one or more BY variable and other is aggregate() function in which BY variable should be in list.

There are 5 types of sorting algorithms are used which are:-

- Bubble Sort
- Selection Sort
- Merge Sort
- Quick Sort
- Bucket Sort

For creating new variable assignment operator ‘<-’ is used

For e.g. mydata$sum <- mydata$x1 + mydata$x2

Packages are the collections of data, R functions and compiled code in a well-defined format and these packages are stored in library. One of the strengths of R is the user-written function in R language.

Workspace is the current R working environment which includes any user defined objects like vector, lists etc.

Merge()function is used to merge two data frames

Eg. Sum<-merge(data frame1,data frame 2,by=’ID’)

rbind() function is used to merge two data frames vertically.

Eg.

Sum<- rbind(data frame1,data frame 2)

It is used for experimental design .It is used to determine the effect of given sample size.

Pwr package is used for power analysis in R.

There are many ways to export the data into another formats like SPSS, SAS , Stata , Excel Spreadsheet.

For excel xlsReadWrite package is used and for sas,spss ,stata foreign package is implemented.

In R NaN is used to represent impossible values.

Save command is used for storing R objects into a file.

Syntax: >save(z,file=”z.Rdata”)

load command is used for storing R objects from a file.

Syntax: >load(”z.Rdata”)

Coin package is used to achieve the re randomization or permutation based statistical tests.

order() function is used to perform the sorting.

IOS-6.1.3

The event will be dispatched to your delegate for processing.

The app specific objects are Data model objects that store app’s contents.

UIWindow object coordinates the one or more views presenting on the screen.

UIView Controller class.

Using axes() function custom axes are created.

abline() function is add the reference line to a graph.

Syntax:-

abline(h=yvalues, v=xvalues)

vcd package provides different methods for visualizing multivariate categorical data.

GGobi is an open source program for visualization for exploring high dimensional typed data.

It is a package which provide bar plots, mosaic plots, box plots, parallel plots, scatter plots and histograms.

lattice package is to improve on base R graphics by giving better defaults and it have the ability to easily display multivariate relationships.

It is used to provide the maximum likelihood fitting of univariate distributions. It is defined under the MASS package.

Data structures are vectors, arrays, data frames and matrices.

It defines the direction of output.

This function is used to show the packages which are installed.

By this function we see that which packages are currently loaded.

Binary operators are worked on matrices, vectors and scalars.

It is used to define the desired table using function and model formula.

Frequency table is created by table() function.

Loglm() function is used to create log-linear models.

corrgram() function is used to plot correlograms.

Pair() or splom() function is used for create scatterplot matrices.

It is a package which gives nonparametric multiple comparisons.

It is used to check the normality, heteroscedasticity and influential observations.

anova() is used to compare the nested models.

It is defined under the DAAG package which is used for k-fold validation.

It is define under the MASS package which performs stepwise model selection under exact AIC.

It is used to perform the all-subsets regression and it is defined under the leaps package.

It is used to measure the relative importance of each of the predictor in the model.

It provide a variety of regression including scatter plots, variable plots and it also enhanced diagnostic.

It provides a library of robust methods including regression.

It is a package which provides basic robust statistics including model selection methods.

It is define under gplots package which includes confidence intervals and it produces mean plot for single factors.

MANOVA stands for multivariate analysis of variance.

By using MANOVA we can test more than one dependent variable simultaneously.

It is a function which defines in mvnormtest package. It produces the Shapiro-wilk test for multivariate normality.

Barlett.test() is used to provide a parametric k-sample test of the equality of variances.

It is a function which provides a non-parametric k sample test of the equality of variances.

It is define in HH package which provides a graphic test of homogeneity of variance based on brown forsyth.

Numerical variables are represented by lower case letters.

Categorical factors are represented by upper case letters.

Learn more about R in this insightful article on R programming

Logistic regression is used to predict the binary outcome from the given set of continuous predictor variables.

It is used to predict the outcome variable which represents counts from the given set of continuous predictor variable.

It includes number of techniques which is used for modeling the time to an event.

It estimates a survival distribution one or more groups.

It determines the differences in survival distribution between two or more groups.

It is a function which is used to model the hazard function on the set of predictor variable.

Survival analysis is defined under the survival package.

MASS functions include those functions which performs linear and quadratic discriminant function analysis.

qda() prints a quadratic discriminant function.

lda() is used to print the discriminant functions which is based on centered variable.

It provides the functions which are used for automatic selection of ARIMA and exponential models.

It is used to handle the seasonal as well as non-seasonal ARIMA models.

It is define in psych package which is used to rotate and extract the principal components.

It is a package which includes quantitative and qualitative variables. It also includes supplementary variables and observations.

CFA stands for Confirmatory Factor Analysis.

It is used to bootstrap the structural equation model.

SEM stands for Structural Equation Modeling.

cmdscale() function is used to perform classical multidimensional scaling.

This function is defined under the MASS package which performs nonmetric multidimensional scaling.

It is done by indscal() function.

It comes under the pvclust package which provides p-values for hierarchical clustering.

It is define in fpc package which provide a method for comparing the similarity of two clusters solution using different validation criteria.

It is used to provide a non-parametric regression for ordinal, nominal, censored and multivariate responses.

boot package is used which provide bootstrapping.

Matlab package includes those wrapper functions and variable which are used to replicate matlab function calls.

Matrix package includes those function which support sparse and dense matrices like Lapack, BLAS etc.

Great information for me to enhance my skills in programming language.

Thanks

This will be really helpful for people who want to be an analyst. Thank you for the valuable questions.

Good.Great resource to learners.

very nice information

Now iam learning r programming,i learnt a lot of information on r