Top SAS interview Questions And Answers
Top Answers to SAS Interview Questions
|Criteria||SAP BO||SAS BI|
|Deploying data for||High level of visualization, customer-friendly||Quick data integration with diverse sources|
|Ad hoc analysis||Very Good||Average|
|Mobile BI||Good||Very Good|
|Analytics||Predictive analytics||Easy analytics|
|Application||Front-end suite to sort, view and analyze BI data||Combining BI & Analytics for enterprise-grade data|
Find out more in this SAS Tutorial about SAS Analytics tool.
SUBSTR Function is used for extracting a string or replacing contents of character value.
TRANSLATE Function : Characters which are specified in a string are replaced by the characters specified by us.
PROC SORT sorts SAS data set by variables so that a new data set can be prepared for further use.
PROC UNIVARIATE is used for elementary numeric analysis and will examine how data is distributed.
APPEND means adding at the end so in terms of SAS we can say adding one sas data set to another sas data set.
For analyzing data BMPD procedure is used.
RUN-Group processing is used to submit a PROC step using RUN statement without ending the procedure.Interested in a high-paying career in Big Data?
BY statement is used by BY-Group processing so that it can process data which are indexed, grouped or ordered based on the variables.
CALENDAR procedure will show data in a monthly calendar format from SAS data set.
UPCASE and LOWCASE are the functions which are used for character handling functions.
DIVIDE function is used to return the division result.
It is a bitwise logical operation and is used to return bitwise logical OR between two statements.
CALL PRXFREE routine is used for Character String Matching and is used for allocation of free memory for perl regular expression.
It is used for performing replacement of pattern matching.
It is used for searching a character string and as soon as string is found it will return it.
The character or numeric variables which are specified will be assigned missing values through CALL MISSING routine.
It is used for assigning an ALTER password which will stop the users from changing the file.
It is used for compressing the data into new output.
Instruction used by SAS for writing data values is known as Formats.
Variable formats are handled by PROC COMPARE as PROC COMPARE is used for comparing unformatted values.
It provides Ipv6 support, new true type fonts, extended time notations, restart mode, universal printing, checkpoint mode and ISO 8601 support.
By using base 64 encoding, character data is converted into ASCII text.
It is used to return the format which is assigned with the value of the given Statement.
Standard deviation will be returned for nonmissing statements.
By writing OPTIONS OBS=0 at the starting of the code and if execution of code is On PC SAS than log will be detected itself by highlighted colors. These are the two ways for validating an SAS program.
Debugging is the technique for testing the program logic and this can be done with the help of debugger.
When data set is closed, its tape positioning is defined by FILECLOSE.
ODS stands for output delivery system.
CDISC stands for Clinical Data Interchange Standards Consortium.
The method which is used to copy blocks of data is defined as block I/O method.
Copy statement should be followed by an input data library and an output data library.
Max() function is used to return the largest value.
It is a function which provides a system error number.
SAS i.e. Statistical Analysis System is a combined set of software solutions which helps user to analyze data.
- It can change, manipulate, analyze & retrieve data.
- Numerical analysis can be done.
- Report writings.
- Quality can be improved.
Learn more about “What is SAS Analytics?” in this insightful blog.
SAS programs consists of :
- DATA step, which recovers & manipulates data.
- PROC step, which interpret the data.
The main function of Data step is to create SAS data sets by manipulating data.
Program Data Vector is the area of memory where data sets are created through SAS system i.e. one at a time. When program is executed an input buffer is created which will read the data values and make them assign to their respective variables.
By using WHERE statements automatic conversions can’t be performed because WHERE statement variables exist in the data set.
The identical observations are checked and removed through NODUP option. NODUPKEY option checks for all BY variable values and if found, it will eliminate that.
Proc Summary is same as Proc Means i.e. it will give descriptive statistics but it will not give output as default, we have to give an option print then only it will give the output.
PROC print outputs a listing of the values of some or all of the variables in a SAS data set. PROC contents tells the structure of the data set rather than the data values.
The functions of Procglm are covariance analysis, variance analysis, multivariate and repeated analysis of variance.
An informat is an instruction that SAS uses to read data values. They are used to read, or input data from external files.
CATX syntax inserts delimiters, removes trailing and leading blanks and returns a concatenated character string.
PROC gplot identifies the data set that contains the plot variables. It has more options and therefore can create more colorful and fancier graphics.
By using DESCENDING keyword in PROC SORT code, we can sort in descending order.
Input Function : Character values are converted into numeric values Put function : Numeric values are converted into character values.
Single Dash specifies consecutively numbered variables. Double Dash specifies variables available within that data set. For example:
Data Set: ID NAME B1 B2 C1 B3Then, B1 – B3 would return B1 B2 B3 And B1 – B3 would return B1 B2 C1 B3.
The points important for running SAS program are :
- DATA statement, which names your data set.
- The names of the variables in your data set are described by INPUT statement.
- Statement should be ended through semi-colon(;).
- Space between word and statement should be there.
The input delimiters are DLM and DSD.
Format : A format is to write data i.e. WORDIATE18 and WEEKDATEW
Informat : An informat is to read data i.e. comma, dollar and date (MMDDYYw, DATEw, TIMEw, PERCENTw)
RIM : removes trailing blanks from a character expression
Str1 = ‘my’; Str2 = ‘dog’; Result = TRIM (Str1)(Str2); Result = ‘mydog’
PDV is a logical area in the memory
- SAS creates a dataset one observation at a time.
- Input buffer is created at the time of compilation, for holding a record from external file.
- PDV is created followed by the creation of input buffer.
- SAS builds dataset in the PDV area of memory
Each package offers its own unique strengths and weaknesses. As a whole, SAS, Stata and SPSS form a set of tools that can be used for a wide variety of statistical analysis. With Stat/Transfer it is very easy to convert data files from one package to another in just a matter of seconds or minutes. Therefore, there can be quite an advantage to switching from one analysis package to another depending on the nature of your problem. For example, if you were performing analysis using mixed models you might choose SAS, but if you were doing logistic regression you might choose Stata, and if you were doing analysis of variance you might choose SPSS. If you are frequently performing statistical analysis, we would strongly urge you to consider making each one of these packages part of your toolkit for data analysis.
SAS/ETS software provides tools for a wide variety of applications in business, government, and academia. Major uses of SAS/ETS procedures are economic analysis, forecasting, economic and financial modeling, time series analysis, financial reporting, and manipulation of time series data.
The common theme relating the many applications of the software is time series data: SAS/ETS software is useful whenever it is necessary to analyze or predict processes that take place over time or to analyze models that involve simultaneous relationships.
Although SAS/ETS software is most closely associated with business, finance and economics, time series data also arise in many other fields. SAS/ETS software is useful whenever time dependencies, simultaneous relationships, or dynamic processes complicate data analysis.For example, an environmental quality study might use SAS/ETS software’s time series analysis tools to analyze pollution emissions data. A pharmacokinetic study might use SAS/ETS software’s features for nonlinear systems to model the dynamics of drug metabolism in different tissues.
To create a compressed SAS data set, use the COMPRESS=YES option as an output DATA set option or in an OPTIONS statement.Compressing a data set reduces its size by reducing repeated consecutive characters or numbers to 2-bye or 3-byte representations.To uncompress observations, you must use a DATA step to copy the data set and use option COMPRESS=NO for the new data set.
The advantages of using a SAS compressed data set are reduced storage requirements for the data set and fewer input/output operations necessary to read from and write to the data set during processing. The disadvantages include not being able to use SAS observation number to access an observation. The CPU time required to prepare compressed observations for input/output observations is increased because of the overhead of compressing and expanding the observations. (Note: If there are few repeated characters, a data set can occupy more space in compressed form than in uncompressed form, due to the higher overhead per observation.) For more details on SAS compression see “SAS Language: Reference, Version 6, First Edition, Cary, NC: SAS Institute Inc., 1990”.
When you are working with large data sets, you can do the following steps to reduce space requirements.
- Split huge data set into smaller data sets.
- Clean up your working space as much as possible at each step.
- Use data set options (keep= , drop=) or statement (keep, drop) to limit to only the variables needed.
- Use IF statement or OBS = to limit the number of observations.
- Use WHERE= or WHERE or index to optimize the WHERE expression to limit the number of observations in a Proc Step and Data Step.
- Use length to limit the bytes of variables.
- Use _null_ data set name when you don’t need to create a data set.
- Compress data set using system options or data set options (COMPRESS=yes or COMPRESS=binary).
Use SQL to do merge, summary, sort etc. rather than a combination of Proc Step and Data Step with temporary data sets.
Find out how Sas,R and Python For big data Solutions can help you get ahead in your career!